NewsBlur bot
What is NewsBlur?
NewsBlur is a popular RSS feed reader and news aggregation service that helps users follow and organize content from their favorite websites. Created and operated by Samuel Clay, NewsBlur allows users to subscribe to RSS feeds from various websites and read all their content in one centralized location. The service was first deployed in 2009 and has evolved into a full-featured news reading platform available at NewsBlur.com.
Technically, NewsBlur functions as a specialized web crawler that fetches RSS/Atom feeds and webpage content. It operates by regularly checking subscribed feeds for new content, which it then processes and displays to users in a readable format. The service also retrieves full webpage content and site favicons to enhance the reading experience.
In server logs, NewsBlur identifies itself through several user agent strings depending on its task. These include NewsBlur Feed Fetcher
, NewsBlur Page Fetcher
, and NewsBlur Favicon Fetcher
, typically followed by subscriber count information. For example: NewsBlur Feed Fetcher - 7 subscribers - http://www.newsblur.com/site/1948420/analytics-piwik
.
A distinctive characteristic of NewsBlur is that it includes the number of subscribers for each feed in its user agent string, providing website owners with transparency about their readership through the service.
Why is NewsBlur crawling my site?
NewsBlur crawls websites primarily to retrieve RSS/Atom feed content that users have subscribed to. If you're seeing NewsBlur in your logs, it means at least one NewsBlur user has subscribed to your site's feed. The crawler fetches new content to keep subscribers updated when you publish new articles or posts.
The frequency of visits depends on how often you update your content and how many NewsBlur users subscribe to your feed. Sites with more frequent updates and higher subscriber counts may see more regular visits. NewsBlur may also fetch your site's favicon and full webpage content to provide enhanced reading features to its users.
This crawling is generally considered authorized as it's retrieving publicly available content that you've chosen to publish in RSS format, which is specifically designed for syndication and aggregation services like NewsBlur.
What is the purpose of NewsBlur?
NewsBlur serves as a content aggregation and reading platform that helps users efficiently consume content from multiple sources in one place. The service collects RSS feed data to provide its users with a streamlined reading experience, allowing them to follow hundreds of sites without having to visit each one individually.
The collected data is used to populate users' reading streams and provide features like content filtering, search, and social sharing. NewsBlur also offers an "Intelligence Trainer" feature that helps users filter content based on their preferences.
For website owners, NewsBlur provides value by expanding your content's reach and making it more accessible to readers. The subscriber count in the user agent string gives you insight into how many NewsBlur users follow your content, which can be valuable feedback about your audience.
How do I block NewsBlur?
NewsBlur respects the standard robots.txt protocol, so you can control its access to your site using this method. If you wish to block NewsBlur from crawling your entire site, add the following to your robots.txt file:
User-agent: NewsBlur
Disallow: /
If you want to block only specific sections of your site, you can specify those paths:
User-agent: NewsBlur
Disallow: /private/
Disallow: /members-only/
Keep in mind that blocking NewsBlur means that users who rely on the service will no longer receive updates from your site, potentially reducing your readership and engagement. Since NewsBlur is a legitimate service that helps distribute your content to interested readers, blocking it might not be beneficial unless you have specific concerns about bandwidth usage or content access.
If you're experiencing issues with excessive crawling, consider reaching out to NewsBlur's support before implementing a complete block, as they may be able to adjust their crawling behavior for your site. The service is designed to be a "good citizen" of the web ecosystem and generally implements reasonable rate limiting to avoid overloading websites.
Operated by
Data fetcher
AI model training
Acts on behalf of user
Obeys directives
User Agent
NewsBlur Feed Fetcher - {subscriber_count} subscribers - {site_url} (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)