What is MojeekBot?

MojeekBot is the web crawler developed and operated by Mojeek, a UK-based independent search engine company. Formerly known as Citenikbot, this bot serves as the primary web crawler for the Mojeek search engine. As a web crawler, MojeekBot systematically browses the internet to discover, analyze, and index web content that later becomes searchable through Mojeek's search platform.

The bot identifies itself in server logs with user-agent strings such as Mozilla/5.0 (compatible; MojeekBot/0.11; +https://www.mojeek.com/bot.html). The version number has evolved over time, with versions ranging from 0.2 to 2.0, indicating ongoing development and improvements to the crawler's capabilities. The user-agent string also includes a link to Mojeek's bot documentation page, which is a standard practice for legitimate web crawlers to provide transparency about their identity and purpose.

MojeekBot works by visiting websites, following links, and collecting information about web pages to build Mojeek's search index. This process helps Mojeek create and maintain an independent search database that isn't reliant on other search engines' data.

Why is MojeekBot crawling my site?

MojeekBot is crawling your site to discover and index your content for inclusion in Mojeek's search engine results. This is a standard and legitimate practice for search engine crawlers. The bot is looking for all publicly accessible content on your website, including text, images, and links to other pages.

The frequency of MojeekBot visits depends on various factors, including your website's size, popularity, and how often your content changes. More established or frequently updated sites may receive more regular visits as the crawler attempts to keep Mojeek's index current with your latest content.

MojeekBot's crawling is generally considered authorized for publicly accessible websites, as indexing is a fundamental aspect of how the web functions. The bot is designed to respect standard crawling protocols and permissions set by website owners.

What is the purpose of MojeekBot?

MojeekBot's primary purpose is to build and maintain the search index for Mojeek, an independent search engine. Unlike many other search services that may license or share index data, Mojeek emphasizes building its own independent index of the web, making MojeekBot essential to its operations.

The data collected by MojeekBot is used to power Mojeek's search results, allowing users to find relevant content across the internet. For website owners, having content indexed by Mojeek provides another channel through which potential visitors can discover their site, potentially increasing visibility and traffic.

Mojeek positions itself as a privacy-focused alternative to larger search engines, emphasizing that it doesn't track users or build profiles based on search history. This philosophy extends to its crawling practices, which aim to be respectful of website resources and owner preferences.

How do I block MojeekBot?

MojeekBot respects the standard robots.txt protocol, making it straightforward to control how it accesses your site. If you wish to block MojeekBot from crawling your entire website, you can add the following directives to your robots.txt file:

User-agent: MojeekBot
Disallow: /

To block MojeekBot from specific sections of your site while allowing access to others, you can specify particular paths:

User-agent: MojeekBot
Disallow: /private/
Disallow: /members-only/
Allow: /

If you want to allow MojeekBot to crawl everything except specific file types, you can use wildcards:

User-agent: MojeekBot
Disallow: /*.pdf$

Blocking MojeekBot means your content won't appear in Mojeek search results, which could reduce your site's visibility to users of this search engine. However, if you're concerned about server load or have content you don't want indexed, controlling MojeekBot's access through robots.txt provides a standard and effective method.

For websites with sensitive information or those looking to manage crawler traffic, properly configuring your robots.txt file offers a balance between visibility and control over how your content is discovered and indexed.

MojeekBot