panscient.com bot

What is panscient.com?

Panscient.com is a large-scale web crawler operated by Panscient, a company that specializes in vertical search, people search, company search, information extraction, business intelligence, lead generation, and machine learning. Their web crawler systematically navigates millions of websites to collect specific types of business and professional information. The crawler uses patented technology to extract information from corporate websites, which is then licensed to authorized resellers and third parties for business intelligence, marketing, and data management purposes. The crawler identifies itself in server logs using the user-agent strings panscient.com or pantest. Like other major search engine crawlers, it accesses publicly available web content but respects standard robot exclusion protocols. The Panscient crawler is designed to request pages at a maximum rate of once per second from the same domain name or IP address, making it relatively unobtrusive compared to more aggressive crawlers.

Why is panscient.com crawling my site?

Panscient primarily crawls websites looking for corporate information, including company names, addresses, executive biographies, job openings, and product information. They also collect genealogy data such as birth, marriage and death records, obituaries, and census records. If you’ve registered a domain name, particularly a .com domain, Panscient likely discovered your site through publicly available domain registration lists. Their crawler periodically checks registered domains for business information. The crawler may attempt to extract links from JavaScript and other scripting languages on your site, which sometimes results in requests for pages that don’t exist. These are not attempts to circumvent security but rather misinterpretations of script content. Panscient only accesses publicly available information and doesn’t collect private or sensitive personal information such as social security numbers, driver’s licenses, or dates of birth.

What is the purpose of panscient.com?

Panscient uses its crawler to build specialized vertical search engines and business intelligence databases. The company collects professional business contact information from US-based corporate websites, which includes names, titles, addresses, phone numbers, email addresses, domain names, and URLs. They also gather background information about company management and employees, including educational and career history, and membership in external organizations. This information is then licensed to authorized resellers and third-party businesses for business intelligence, marketing, and data management purposes. The data helps these clients identify potential business contacts, generate leads, and enhance their understanding of business relationships and organizational structures. Panscient’s crawler serves a legitimate business purpose, but website owners should be aware that the information collected may be commercially distributed.

How do I block panscient.com?

Panscient’s web crawler respects the Robot Exclusion Standard (robots.txt), making it relatively easy to control its access to your website. To completely exclude the Panscient crawler from your entire site, add the following entry to your robots.txt file:

User-Agent: panscient.com
Disallow: /

If you only want to block access to specific directories or files, you can modify your robots.txt file to identify just those areas:

User-Agent: panscient.com
Disallow: /private/
Disallow: /confidential-file.html

The Panscient crawler also obeys robots meta-tag directives such as “noindex” and “nofollow” which can be placed in the header section of individual web pages. This gives you granular control over which pages should not be indexed.

If you want to have all your business contact information removed from Panscient’s database, you can send an email to crawler@panscient.com. Individuals whose contact information is included in their professional business contact data may request to be excluded by emailing privacy@panscient.com. Blocking the crawler will prevent future collection of information but won’t remove data that has already been collected unless you specifically request removal.

Data collector