What is MJ12bot?

MJ12bot is a web crawler operated by Majestic, a UK-based specialist search engine company. It functions as an SEO crawler designed to map the link relationships between websites across the internet. The bot identifies itself in server logs with a user agent string like Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/), where the version number may vary. Currently operating versions include v1.4.8 (released April 2017) and newer.

MJ12bot is part of a distributed crawling system that enables Majestic to build a comprehensive map of the internet independent of consumer-based search engines. Unlike some crawlers that cache web content, MJ12bot primarily focuses on mapping link relationships between websites. This data powers Majestic's backlink analysis tools and search capabilities available through their commercial services.

The crawler is designed to be well-behaved, always identifying itself clearly and adhering to standard crawling protocols. As a distributed crawler, it doesn't operate from a fixed IP range but rather from various locations worldwide, making it impossible to block by IP address alone. More information about the crawler can be found at MJ12bot.com.

Why is MJ12bot crawling my site?

MJ12bot visits websites to discover and analyze the link structure of the internet. It's specifically looking for hyperlinks between pages and domains to build Majestic's backlink database. The crawler follows links it discovers on other websites to reach your site, then continues following internal links to map your site's structure and outbound links.

The frequency of MJ12bot visits varies based on several factors, including your site's popularity, how many inbound links it has, and how frequently your content changes. High-profile sites with many backlinks may see more frequent visits than smaller sites with fewer connections.

MJ12bot may continue to periodically check pages that returned errors (like 404s) or redirects (301s) to ensure temporary issues don't permanently affect your site's profile in their database. It also follows links marked with rel=nofollow attributes, as these still represent navigational paths even if they don't pass SEO value.

What is the purpose of MJ12bot?

MJ12bot serves to build and maintain Majestic's Site Explorer, which is described as "the largest public backlinks search engine index." The data collected helps website owners, SEO professionals, and digital marketers understand their backlink profiles, analyze competitors, and identify link-building opportunities.

The crawler doesn't currently cache web content or personal data. Instead, it focuses exclusively on mapping link relationships. This information is made available through Majestic's commercial tools, allowing users to search for keywords or analyze specific websites to understand their position in the web's link ecosystem.

For website owners, the data collected by MJ12bot can provide valuable insights about who is linking to their content and how their site connects to the broader internet. This information can help inform SEO strategies and content development decisions.

How do I block MJ12bot?

MJ12bot respects the standard robots.txt protocol, making it relatively straightforward to control its access to your site. If you wish to block the bot completely, add the following to your robots.txt file:

User-agent: MJ12bot
Disallow: /

If you want to allow the bot but slow down its crawling to reduce server load, you can use the Crawl-Delay directive:

User-agent: MJ12bot
Crawl-Delay: 5

The number represents seconds between requests, and MJ12bot supports values up to 20 seconds. Higher values will be rounded down to the maximum supported delay.

The crawler also supports more advanced robots.txt features including pattern matching in Disallow directives, Allow directives that can override Disallow when more specific, and redirects within the same site when fetching robots.txt. If MJ12bot cannot retrieve your robots.txt file, it will assume crawling is permitted.

Blocking MJ12bot might prevent your site from appearing in Majestic's link database, potentially reducing visibility to users of their SEO tools. However, if you're concerned about server resources or prefer not to have your link structure indexed, blocking is a reasonable option.

MJ12bot

What is MJ12bot?

Why is MJ12bot crawling my site?

What is the purpose of MJ12bot?

How do I block MJ12bot?

Operated by

Documentation

AI model training

Acts on behalf of user

Obeys directives

User Agent