magpie-crawler

What is magpie-crawler?

magpie-crawler is an intelligence gathering web crawler operated by Brandwatch, a social media monitoring and digital consumer intelligence company. It functions as a conventional web scraper that systematically indexes publicly available content across the internet—including blogs, forums, news sites, and social media platforms. The crawler identifies itself in server logs with a user-agent string that typically appears as magpie-crawler/1.1 or with additional details such as magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net). This string reveals its versioning (1.1), operating system (Linux), and processor architecture (amd64).

Unlike AI-driven crawlers, magpie-crawler is designed primarily for efficient data collection rather than content analysis. It adheres to standard web protocols, including respecting robots.txt directives, making it a legitimate and transparent web crawler. Brandwatch uses this tool to aggregate data from millions of web pages daily, enabling their clients to monitor brand mentions, track market trends, and analyze public discourse across digital platforms.

Why is magpie-crawler crawling my site?

magpie-crawler visits websites to collect publicly available information that may be relevant to Brandwatch's clients. If you're seeing this crawler in your logs, it's likely because your site contains content related to brands, products, or industry keywords that Brandwatch clients are monitoring. The crawler is particularly interested in content that expresses opinions, reviews, or discussions about specific companies, products, or topics.

The frequency of visits depends on the relevance of your content to Brandwatch clients' monitoring needs. Sites with frequent mentions of popular brands or trending topics may experience more regular visits. The crawler operates continuously but typically distributes its requests to avoid overwhelming servers. Its crawling is authorized in the sense that it accesses only publicly available content and respects standard access control mechanisms.

What is the purpose of magpie-crawler?

magpie-crawler supports Brandwatch's social listening and digital consumer intelligence platform. The data it collects helps organizations monitor online mentions of their brands, analyze market trends, track sentiment around products or campaigns, and gain insights into consumer behavior and preferences. This intelligence enables Brandwatch clients to make informed marketing decisions, manage their online reputation, and understand public perception of their brand.

For website owners, there's no direct benefit from being crawled by magpie-crawler, unlike search engine bots that may drive traffic to your site. However, if your content influences brand perception or consumer opinions, being included in Brandwatch's analysis might indirectly increase your content's impact on business decisions. The crawler is designed to be respectful of server resources and follows standard web etiquette, so it shouldn't cause performance issues for most websites.

How do I block magpie-crawler?

magpie-crawler respects the robots.txt protocol, making it straightforward to control its access to your website. If you wish to block it completely, add the following directives to your robots.txt file:

User-agent: magpie-crawler
Disallow: /

This will instruct the crawler not to access any part of your website. If you prefer to allow access to certain areas while restricting others, you can specify particular directories or pages to disallow:

User-agent: magpie-crawler
Disallow: /private/
Disallow: /members/
Allow: /

Blocking magpie-crawler won't affect your search engine rankings or visibility to users, as it's not connected to search engine functionality. However, it does mean that any mentions of brands or topics on your site won't be included in Brandwatch's analysis, potentially reducing your content's influence on brand strategies and market research. Brandwatch provides documentation about their crawler at their website, and they're generally responsive to webmasters' concerns about crawl rates or behavior if direct contact is needed.

Something incorrect or have feedback?
Share feedback
magpie-crawler logo

Operated by

Data collector

Documentation

Go to docs

AI model training

Not used to train AI or LLMs

Acts on behalf of user

No, operates independently of any user action

Obeys directives

Yes, obeys robots.txt rules

User Agent

magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)