What is Feedfetcher-Google?

Feedfetcher-Google is a specialized web crawler operated by Google that retrieves and processes RSS and Atom feeds from websites. It was first documented around 2009 and falls under the category of user-triggered fetchers rather than automated web crawlers. Google's Feedfetcher grabs RSS or Atom feeds when users explicitly request them through Google services like Google News or when publishers use technologies like PubSubHubbub.

The bot identifies itself in server logs with the user agent string Feedfetcher-Google often followed by additional information like a feed ID. Unlike standard web crawlers that autonomously discover content, Feedfetcher only retrieves feeds that have been specifically requested by human users through Google's services. This makes it function more as a direct agent of human users rather than an autonomous crawler.

Feedfetcher is designed to run on distributed machines to improve performance and reduce bandwidth usage, with servers often positioned close to the sites they're retrieving from. Once it collects feeds, it stores and periodically refreshes them to keep the content current for users of Google's services.

Why is Feedfetcher-Google crawling my site?

Feedfetcher-Google is visiting your site because one or more users have explicitly requested your RSS or Atom feeds through a Google service. This is always user-initiated—someone has added your feed to a Google service that displays feed content. The bot isn't discovering your content on its own; it's responding to specific user requests.

The frequency of visits depends on how often your content is updated and how many users have subscribed to your feeds. For most sites, Feedfetcher shouldn't retrieve feeds more than once every hour on average, though frequently updated sites may see more regular visits. Network delays might occasionally make it appear that Feedfetcher is retrieving your feeds more frequently than it actually is.

If you notice Feedfetcher attempting to access feeds that don't exist or accessing areas of your site that aren't public feeds, this is likely because a user has requested a non-existent or incorrect URL. Even "secret" server URLs might be accessed if a user somehow knows about them and has requested them through a Google service.

What is the purpose of Feedfetcher-Google?

Feedfetcher-Google serves as the mechanism by which Google delivers requested feed content to users. Its primary purpose is to retrieve, store, and refresh RSS or Atom feeds that users have explicitly added to Google services like Google News or other feed-reading products.

Unlike general web crawlers that index content for search results, Feedfetcher typically doesn't index the feeds it collects in Google Search or other search services (with podcast feeds being an exception). Instead, it acts as an intermediary, fetching content on behalf of users and delivering it to their Google services.

By collecting feeds once for multiple users who have requested the same content, Feedfetcher conserves bandwidth while still providing timely updates. This benefits website owners by reducing the total number of requests to their servers compared to if each user's client fetched the feed independently.

For publishers, Feedfetcher's activity indicates that users are actively subscribing to and reading their content through Google's ecosystem, potentially expanding their audience reach.

How do I block Feedfetcher-Google?

Unlike most Google crawlers, Feedfetcher-Google does not respect robots.txt rules. This is because Feedfetcher acts as a direct agent of human users rather than an automated crawler. When users explicitly request your feed content through Google services, Feedfetcher attempts to retrieve it regardless of robots.txt directives.

If you need to prevent Feedfetcher from accessing your feeds, you'll need to implement server-side controls. One effective approach is to configure your server to return an error status code specifically for the Feedfetcher-Google user agent. For example, you can set up your server to return a 404 (Not Found) or 410 (Gone) response when it detects this user agent.

Keep in mind that blocking Feedfetcher means users who have added your feeds to Google services will no longer receive updates. This could reduce your content's visibility and reach among audiences who prefer to consume content through Google's feed-reading services.

If your feed is provided through a blog or site hosting service rather than your own server, you'll need to work directly with that service to implement any access restrictions, as you may not have direct control over server configurations.

The IP addresses used by Feedfetcher are included in Google's published list of user-triggered fetchers, which can be helpful if you need to implement more granular access controls at the network level.

Feedfetcher-Google