What is Baiduspider?
Baiduspider is the umbrella term for Baidu's official web crawler, encompassing three specialized types: a desktop crawler that mimics desktop user behavior, a mobile crawler for mobile device simulation, and a mini-program crawler designed for interactions within the Baidu mobile app.
Your website is likely to be visited by both the Baiduspider Desktop and Baiduspider Mobile, while Baiduspider Mini-Program focuses on your mini-program.
To distinguish between these crawler types, you can examine the user agent string in their requests. Importantly, all variations of Baiduspider adhere to the same rules specified for the Baiduspider token in the robots.txt file. This means directives applied to Baiduspider affect all its types, preventing the possibility of targeting either the mobile or desktop crawlers specifically through robots.txt.
What are the Baiduspider User-Agents?
Baiduspider has three different user agents, as below:
Baidu Mobile
1、Mozilla/5.0(Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko)Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html) 2、Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;+http://www.baidu.com/search/spider.html)
Baidu PC
1、Mozilla/5.0(compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) 2、Mozilla/5.0(compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)
Baidu Mini-Program
Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;Smartapp; +http://www.baidu.com/search/spider.html)
How to Verify Baiduspider?
In today's digital age, where it's relatively simple to spoof the user agent information of an HTTP request, verifying the authenticity of Baiduspider is crucial. This can be achieved in two straightforward steps:
Step 1 - Conduct a Reverse DNS Lookup
Start by running a reverse DNS lookup on the IP address found in your server logs, aiming to verify if the domain name ends with either *.baidu.com or *.baidu.jp. Here's how you can perform this verification across different operating systems:
Linux: Utilize the
host
command to carry out a reverse DNS lookup, confirming if the request originates from a genuine Baiduspider.Windows: Employ the
nslookup
command to examine the IP address.macOS: Use the
dig
command for conducting a DNS lookup.
Step 2 - Execute a Forward DNS Lookup
After identifying the domain name from Step 1, proceed with a forward DNS lookup on that domain name using the host
command. This step is to ensure that the domain name resolves back to the original IP address logged by your server.
Here's an example to guide you through the process:
> host 123.206.198.68 68.198.206.123. in-addr.arpa domain name pointer baiduspider-123-206-198-68.crawl.baidu.com. > host baiduspider-123-206-198-68.crawl.baidu.com baiduspider-123-206-198-68.crawl.baidu.com has address 123.206.198.68
How to block Baiduspider?
To prevent Baiduspider from crawling your website, you can easily add the following lines to your robots.txt file:
User-Agent: Baiduspider
Disallow: /
This code instructs Baiduspider specifically to refrain from accessing any part of your site.
If you're encountering issues with Baiduspider, or if you have any questions regarding SEO practices related to Baidu, don't hesitate to reach out to us for assistance.