1. What is Baiduspider?
“Baiduspider” is the official name of Baidu’s web crawling spider. It crawls web pages and returns updates to the Baidu index.
2. What are the user-agents of Baiduspider?
|Baidu Web/Mobile Search||Baiduspider (the same for web and mobile crawlers)|
|Baidu Image Search||Baiduspider-image|
|Baidu Video Search||Baiduspider-video|
|Baidu News Search||Baiduspider-news|
|Baidu Bookmark Search||Baiduspider-favo|
|Baidu Business Search||Baiduspider-ads|
|Baidu Union Search||Baiduspider-cpro|
3. I don’t want Baiduspider to index my website, what should I do?
You may have a website that doesn’t target the Chinese audience so being crawled by Baiduspider is a waste of your bandwidth. Good news for you – Baiduspider obeys robots.txt, so some simple commands in your robots.txt file can help you out. For example, you don’t want your site to be indexed by Baiduspider, you can use the following:
a. To block all spiders from Baidu:
b. To block Baidu Video spiders:
4. How can I know if someone is faking Baiduspider to crawl my website?
a. On Linux:
You can resolve IP addresses to hostname, to check if the hostname format is “*.baidu.com”. If not, it is a fake Baiduspider
b. On Windows:
Start – Run – input “tracert xxx.xxx.xxx.xxx (the IP address)”, then check if the hostname is in the format of “*.baidu.com”
5. How does Baiduspider work?
When Baiduspider comes to a web page, it: 1. crawls the web page and put it in storage; 2. adds the links on your page into its list to check later. This is no different than other search engine bots’ crawling activities, such as Googlebot. Baiduspider sets the crawling frequency based on the server load so usually it doesn’t cause any load problem to the server.
6. Does Baiduspider prefer servers located in China?
Baiduspider’s access to your website is very similar with a real visitor. If a visitor based in China has fast access to the website, so does Baiduspider.
I am sure this post doesn’t cover all the questions about Baidu and Baiduspider you may have, feel free to leave a comment, 🙂
At the mean time, you may also directly contact Baidu via Email at spide[email protected], if you have any indexation issues on your site.