SEO SHIFU

View Original

What is Baiduspider?

Baiduspider is the umbrella term for Baidu's official web crawler, encompassing three specialized types: a desktop crawler that mimics desktop user behavior, a mobile crawler for mobile device simulation, and a mini-program crawler designed for interactions within the Baidu mobile app.

Your website is likely to be visited by both the Baiduspider Desktop and Baiduspider Mobile, while Baiduspider Mini-Program focuses on your mini-program.

To distinguish between these crawler types, you can examine the user agent string in their requests. Importantly, all variations of Baiduspider adhere to the same rules specified for the Baiduspider token in the robots.txt file. This means directives applied to Baiduspider affect all its types, preventing the possibility of targeting either the mobile or desktop crawlers specifically through robots.txt.

What are the Baiduspider User-Agents?

Baiduspider has three different user agents, as below:

Baidu Mobile

See this content in the original post

Baidu PC

See this content in the original post

Baidu Mini-Program

See this content in the original post

How to Verify Baiduspider?

In today's digital age, where it's relatively simple to spoof the user agent information of an HTTP request, verifying the authenticity of Baiduspider is crucial. This can be achieved in two straightforward steps:

Step 1 - Conduct a Reverse DNS Lookup

Start by running a reverse DNS lookup on the IP address found in your server logs, aiming to verify if the domain name ends with either *.baidu.com or *.baidu.jp. Here's how you can perform this verification across different operating systems:

  • Linux: Utilize the host command to carry out a reverse DNS lookup, confirming if the request originates from a genuine Baiduspider.

  • Windows: Employ the nslookup command to examine the IP address.

  • macOS: Use the dig command for conducting a DNS lookup.

Step 2 - Execute a Forward DNS Lookup

After identifying the domain name from Step 1, proceed with a forward DNS lookup on that domain name using the host command. This step is to ensure that the domain name resolves back to the original IP address logged by your server.

Here's an example to guide you through the process:

See this content in the original post

How to block Baiduspider?

To prevent Baiduspider from crawling your website, you can easily add the following lines to your robots.txt file:

User-Agent: Baiduspider
Disallow: /

This code instructs Baiduspider specifically to refrain from accessing any part of your site.

If you're encountering issues with Baiduspider, or if you have any questions regarding SEO practices related to Baidu, don't hesitate to reach out to us for assistance.