YandexNews bot
What is YandexNews?
YandexNews is a web crawler developed and operated by Yandex LLC, a major Russian technology company. This bot functions as a specialized indexing crawler designed to discover, collect, and process news content from websites for inclusion in Yandex's news aggregation services. The bot identifies itself in server logs with the user agent string Mozilla/5.0 (compatible; YandexNews/4.0; +http://yandex.com/bots)
, which includes a reference to Yandex's bot documentation page.
As part of Yandex's broader ecosystem of search and information services, YandexNews systematically visits websites to identify fresh news content, headlines, publication dates, and other metadata that helps Yandex categorize and present news to its users. The bot exhibits regular crawling patterns, typically focusing on websites with frequently updated content such as news outlets, blogs, and publications that produce timely information.
Why is YandexNews crawling my site?
YandexNews crawls websites primarily to discover and index news content for inclusion in Yandex's news services. If this bot is visiting your site, it likely means your website contains content that appears to be news-related or regularly updated information that could be valuable to news service users.
The frequency of YandexNews visits depends on how often you publish new content and the perceived value of your content to Yandex's news services. Websites that publish breaking news or timely updates may experience more frequent crawling than those with static content. The bot is programmed to discover new articles, press releases, and other news-worthy content to keep Yandex's news offerings current and comprehensive.
YandexNews crawling is generally considered authorized crawler activity, as it's part of legitimate search engine and news aggregation operations, similar to other major search engines' specialized crawlers.
What is the purpose of YandexNews?
The primary purpose of YandexNews is to power Yandex's news aggregation services by collecting, categorizing, and indexing news content from across the web. This enables Yandex to provide its users with current news stories organized by topics, regions, and relevance.
For publishers and website owners, having content included in Yandex's news services can drive significant traffic to their sites and increase content visibility among Yandex users. The bot helps Yandex identify sources of quality news content and maintain an up-to-date index of current events and stories.
The data collected by YandexNews is used to populate Yandex's news platforms, allowing users to discover relevant news content without having to visit multiple news sites individually. This creates a centralized news experience while still directing users to the original sources for full articles.
How do I block YandexNews?
YandexNews respects the standard robots.txt protocol, making it relatively straightforward to control its access to your website. If you wish to block YandexNews from crawling your entire site, you can add the following directives to your robots.txt file:
User-agent: YandexNews
Disallow: /
This instruction specifically targets the YandexNews bot while allowing other Yandex crawlers to continue accessing your site. If you want to block all Yandex bots, you could use:
User-agent: Yandex
Disallow: /
For more selective control, you can block access to specific directories or files while allowing access to others:
User-agent: YandexNews
Disallow: /private/
Disallow: /members/
Allow: /news/
Blocking YandexNews may result in your content not appearing in Yandex's news services, which could reduce visibility and traffic from users of those platforms. If your site relies on traffic from Yandex's ecosystem, especially in regions where Yandex has significant market share like Russia and parts of Eastern Europe, carefully consider the potential impact before implementing blocks.
If you want your content to be indexed but have specific requirements about how it appears in news results, consider reviewing Yandex's publisher guidelines for more targeted approaches that maintain your presence while addressing specific concerns.
Operated by
Search index crawler
Documentation
Go to docsAI model training
Acts on behalf of user
Obeys directives
User Agent
Mozilla/5.0 (compatible; YandexNews/4.0; +http://yandex.com/bots)