What is TurnitinBot?

TurnitinBot is a specialized web crawler operated by Turnitin LLC, a company that provides academic integrity solutions to educational institutions worldwide. This bot systematically crawls the internet to collect publicly available content for Turnitin's plagiarism detection service. It identifies itself in server logs with user agent strings such as TurnitinBot/2.1 (http://www.turnitin.com/robot/crawlerinfo.html) or similar versions.

As a focused web crawler, TurnitinBot is designed specifically to extract text content rather than render full web pages. It doesn't execute JavaScript, process CSS, or support cookies—focusing solely on collecting text-based content for its database. The bot follows standard web crawler conventions and provides clear attribution to Turnitin in its user agent string, including a link to its documentation.

TurnitinBot operates from a pool of IP addresses managed by Turnitin and follows the Robots Exclusion Protocol, checking robots.txt files before crawling websites. The collected content becomes part of Turnitin's reference database used for comparing student submissions against internet sources.

Why is TurnitinBot crawling my site?

TurnitinBot visits websites to gather text content that might be used in academic work. If you're seeing this bot in your logs, it's likely indexing your public content to add to Turnitin's comparison database. The bot prioritizes content with academic relevance, including educational resources, scholarly articles, research papers, and other text-based materials that students might reference or copy.

The frequency of visits depends on your site's content and how often it changes. Sites with academic content or frequently updated materials may see more regular visits. TurnitinBot's crawling is generally considered authorized as it respects standard web protocols and crawls publicly accessible content. The bot isn't targeting your site specifically but is part of a broader effort to maintain a comprehensive database of online content for plagiarism detection.

What is the purpose of TurnitinBot?

TurnitinBot supports Turnitin's plagiarism detection service by building and maintaining a comprehensive database of online content. When students submit papers through Turnitin's system, their work is compared against this database to identify potential matches with existing online sources. This helps educational institutions maintain academic integrity by detecting instances where students may have copied content without proper attribution.

The data collected by TurnitinBot is used solely for comparison purposes within Turnitin's plagiarism detection service. The bot doesn't store complete copies of websites but rather creates text fingerprints that can be used for similarity matching. While website owners don't directly benefit from this crawling, the broader academic ecosystem benefits from tools that help maintain academic integrity and original scholarship.

How do I block TurnitinBot?

TurnitinBot respects the standard Robots Exclusion Protocol, which means you can control its access to your site using your robots.txt file. To completely block TurnitinBot from crawling your entire site, add the following to your robots.txt file:

User-agent: TurnitinBot
Disallow: /

If you want to block access to specific sections while allowing access to others, you can use more targeted directives:

User-agent: TurnitinBot
Disallow: /private/
Disallow: /drafts/
Allow: /

This would prevent TurnitinBot from accessing content in the /private/ and /drafts/ directories while allowing it to crawl the rest of your site. Blocking TurnitinBot means your content won't be included in Turnitin's comparison database, which could potentially allow plagiarism of your content to go undetected in academic settings. However, if your site contains sensitive information or you're concerned about bandwidth usage, blocking may be appropriate. For more information about TurnitinBot, you can visit the crawler information page linked in its user agent string.

TurnitinBot