What is Superfeedr bot?

What is Superfeedr?

Superfeedr is a specialized feed aggregation service that simplifies how websites and applications handle RSS, Atom, and JSON feeds. Founded by Julien Genestoux, Superfeedr was acquired by Medium in 2016. It operates as a unified Feed API that monitors, parses, and delivers real-time updates from web feeds to subscribers.

Technically classified as a feed crawler and content distribution service, Superfeedr works by continuously polling feeds from across the web, detecting changes, and then pushing those updates to subscribers through various protocols including WebSub (formerly PubSubHubbub) and webhooks. This eliminates the need for clients to repeatedly check feeds themselves, reducing server load and ensuring timely content delivery.

When Superfeedr visits your site, it identifies itself with a user agent string that typically follows this format: Superfeedr Bot (feed-id=12345). The inclusion of a feed-id parameter is a distinctive characteristic that helps trace requests to specific feed subscriptions, making debugging easier for both Superfeedr's team and publishers.

Why is Superfeedr crawling my site?

Superfeedr is crawling your site because someone has subscribed to one or more of your RSS, Atom, or JSON feeds through a service that uses Superfeedr's infrastructure. The bot specifically targets feed URLs (typically ending in .xml, .rss, .atom, or designated feed endpoints) to check for new or updated content.

The frequency of Superfeedr's visits depends on several factors, including how often you update your content, the number of subscribers to your feeds, and the subscription tier of those using Superfeedr's services. For actively updated sites, Superfeedr may check multiple times per hour, while less frequently updated sites might see visits once or twice daily.

Crawling is triggered when either a new subscription to your feed is created, or when Superfeedr performs its regular polling cycle to check for updates. This crawling is generally considered authorized as it's accessing publicly available feed endpoints that are designed for machine consumption.

What is the purpose of Superfeedr?

Superfeedr serves as a middleware layer in the content distribution ecosystem, helping to efficiently distribute web content updates across the internet. Its primary purpose is to reduce the inefficiencies of traditional feed polling where hundreds or thousands of feed readers might independently check the same feed, creating unnecessary server load.

The service supports applications, feed readers, and content aggregation platforms by providing a single standardized API to access feed content. When Superfeedr detects new content in your feeds, it normalizes the data format and pushes it to subscribers in real-time, ensuring they receive timely updates.

For website owners, Superfeedr indirectly provides value by reducing server load (as multiple clients are replaced by a single Superfeedr crawler) and potentially increasing content reach by making it easier for various platforms to consume and redistribute your content. The service essentially helps your content reach its intended audience more efficiently.

How do I block Superfeedr?

Superfeedr respects the robots.txt protocol, so you can control its access to your site by adding appropriate directives to your robots.txt file. To block Superfeedr completely, add the following to your robots.txt:

User-agent: Superfeedr Bot
Disallow: /

If you only want to block it from certain sections of your site while allowing it to access your feeds, you can use more specific directives:

User-agent: Superfeedr Bot
Disallow: /private/
Disallow: /members/
Allow: /feeds/

Keep in mind that blocking Superfeedr may impact how your content is distributed to services and applications that rely on its infrastructure. If your site publishes content that you want others to discover and consume through feed readers or content aggregation services, blocking Superfeedr could reduce your content's visibility and reach.

Alternatively, if you're experiencing issues with Superfeedr's crawling behavior but don't want to block it entirely, you might consider implementing conditional rate limiting at the server level to manage crawl frequency while still allowing access to your feeds.

Superfeedr bot