Meta-ExternalFetcher
What is Meta-ExternalFetcher?
Meta-ExternalFetcher is a specialized web crawler operated by Meta (formerly Facebook) that performs user-initiated fetches of individual links to support specific product functions across Meta's family of apps. It's a component of Meta's AI infrastructure designed to enable real-time content retrieval for its chatbot interfaces in platforms like WhatsApp, Instagram, and Facebook. When a Meta AI user requests information beyond the model's training data or asks to verify specific claims, Meta-ExternalFetcher is dispatched to fetch the target URL's content for immediate processing.
The crawler identifies itself in server logs with the user agent string meta-externalfetcher/1.1
or meta-externalfetcher/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
. Unlike traditional web crawlers that systematically index content, Meta-ExternalFetcher exhibits on-demand activation, triggering HTTP requests only when Meta AI users request specific URL resolutions. This makes its behavior distinct from Meta's other crawlers, such as Meta-ExternalAgent, which crawls the web for training AI models or improving products through direct content indexing.
Why is Meta-ExternalFetcher crawling my site?
Meta-ExternalFetcher visits websites primarily when users of Meta's AI products specifically request information from your site. This happens when a user asks Meta AI to retrieve information from a specific URL or when the AI needs to verify information by checking a source. Unlike broader crawling operations, these visits are targeted and user-initiated.
The frequency of visits depends entirely on how often users direct Meta's AI to access content from your site. Popular sites with frequently shared content may see more visits than less trafficked websites. The crawler is particularly interested in content that can be parsed for immediate answers, especially pages with structured data markup that makes information extraction more efficient.
What is the purpose of Meta-ExternalFetcher?
Meta-ExternalFetcher serves as the bridge between Meta's AI systems and real-time web data. Its primary purpose is to support Meta AI's ability to provide users with up-to-date information beyond what's available in its training data. When users ask questions requiring current information or request verification of specific claims, Meta-ExternalFetcher retrieves the necessary content.
The collected data is used to generate immediate responses to user queries through Meta's Retrieval-Augmented Generation (RAG) pipeline. This process allows Meta AI to provide more accurate, timely, and verifiable information to users. For website owners, this can mean increased visibility of your content within Meta's AI ecosystem, potentially driving engagement when your site is referenced as a source of information.
How do I block Meta-ExternalFetcher?
Controlling Meta-ExternalFetcher access to your site can be approached through standard methods, though with some important caveats. According to Meta's official documentation, Meta-ExternalFetcher may bypass robots.txt rules because it performs fetches that were initiated by a user. This behavior differs from Meta's other crawlers like Meta-ExternalAgent, which strictly adheres to robots.txt directives.
Despite this limitation, you can still attempt to use robots.txt to signal your preferences:
User-agent: Meta-ExternalFetcher
Disallow: /
Or to block specific directories:
User-agent: Meta-ExternalFetcher
Disallow: /private/
Since Meta-ExternalFetcher may bypass these directives for user-initiated requests, you might need to implement additional measures if complete blocking is required. Server-side blocking based on the user agent string can be more effective, though this approach would block all Meta-ExternalFetcher traffic, including legitimate user requests.
Operated by
AI search retriever
Documentation
Go to docsAI model training
Acts on behalf of user
Obeys directives
User Agent
meta-externalfetcher/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)