Hatena bot
What is Hatena?
Hatena is a collection of web crawlers and bots operated by Hatena, a Japanese internet company that offers various social media and content discovery services. These bots serve different functions within Hatena's ecosystem, from bookmarking and content analysis to favicon retrieval and blog services. The company has been operating since 2001, developing a suite of web services that rely on these crawlers to function properly.
Technically, Hatena operates multiple specialized web crawlers that identify themselves through distinct user-agent strings. Common variants include HatenaBlog-bot/0.02
, Hatena::Scissors/0.01
, HatenaBookmark/4.0
, Hatena-Favicon/2
, and Hatena Star UserAgent/2
. Each bot serves a specific purpose within Hatena's network of services, which includes social bookmarking, blogging platforms, and content analysis tools.
These crawlers typically operate from Japanese IP addresses, often from Amazon AWS or Google Cloud infrastructure. They're designed to respect standard web protocols while collecting specific information needed for Hatena's services to function properly.
Why is Hatena crawling my site?
Hatena bots crawl websites primarily when users of Hatena's services interact with your content. The most common scenarios include:
When someone bookmarks your page on Hatena Bookmark (similar to services like Pocket or Delicious), the crawler visits to extract metadata, analyze content, and generate previews for users.
If your site is referenced on a Hatena Blog, their blog-specific crawler may visit to establish connections between content.
The Hatena Favicon crawler specifically looks for your site's favicon to display alongside bookmarks or mentions of your site.
Crawling frequency varies based on how often your content is engaged with by Hatena users. Popular sites that receive many bookmarks or mentions will see more frequent visits from these bots. The crawling is generally authorized as part of normal web operations, similar to how search engines index content.
What is the purpose of Hatena?
Hatena's crawlers support a network of social media and content discovery services popular in Japan. The primary services include Hatena Bookmark (a social bookmarking platform), Hatena Blog (a blogging service), and several complementary tools that help users discover, save, and share web content.
The data collected by these bots enables Hatena to provide rich previews of bookmarked content, analyze relationships between websites, and enhance user experience through content categorization and recommendation systems. For instance, when a user bookmarks your page, Hatena needs to capture information about your content to display it properly within their service.
For website owners, Hatena can drive Japanese traffic to your site if your content becomes popular among their user base. Being bookmarked or featured on Hatena services can increase visibility within the Japanese internet ecosystem.
How do I block Hatena?
Most Hatena bots respect the robots.txt protocol, making it the simplest way to control their access to your site. To block all Hatena bots, you can add these directives to your robots.txt file:
User-agent: Hatena
Disallow: /
User-agent: HatenaBlog-bot
Disallow: /
User-agent: Hatena::Scissors
Disallow: /
User-agent: HatenaBookmark
Disallow: /
User-agent: Hatena-Favicon
Disallow: /
For more selective control, you can block specific paths rather than your entire site. For example, to block only a private section:
User-agent: HatenaBookmark
Disallow: /private/
Some Hatena bots, particularly older variants, may not consistently respect robots.txt. In these cases, you might need to implement server-side blocking based on user-agent strings or IP addresses. However, this approach requires more technical expertise and maintenance.
Consider the implications before blocking Hatena completely. While reducing server load might be beneficial, you'll also lose potential traffic and visibility from Japanese users who might discover your content through Hatena's services. If your site targets or welcomes Japanese audiences, the crawling activity may be worth accommodating.
Operated by
Content archiver
Documentation
Go to docsAI model training
Acts on behalf of user
Obeys directives
User Agent
HatenaBlog-bot/0.02 (+https://help.hatenablog.com/entry/about-hatenablogbot)