Baidu’s New Crawler – You!
First I would like to clarify – this post is not about Baiduspider 2.0, which I already wrote about a while ago, but something much more fun.
In the SEO world, I think there is a basic fact that everybody agrees – no crawling, no indexing.
This is not complicated at all – if a crawler doesn’t crawl your page, then the search engine doesn’t know what your page is, so it doesn’t build an index for the page.
But recently I have observed something interesting about Baidu:
I found a page on my client’s website was successfully indexed by Baidu, but surprisingly, there was not a corresponding record of crawling in the server log.
In the first place I was struck with a headful of question marks, maybe more than that. Then I dig deeper into the server log, seeking information about the particular URL, but only got the single one record as follows:
GET /jzd/2011-5-21/stock-news-19893/ – 80 – 125.116.14.191 Mozilla/4.0+(compatible;+MSIE+6.0;+SV1;+QQPinyin+730;+BaiduGame)
This record may seem confusing, but I am going to explain it for you: this record tells us the URL was only visited by a normal visitor, who used a IE 6.0 with Baidu Toolbar plus the Baidu Game widget installed.
What does this mean?
This means any human who uses Baidu Toolbar can be a “crawler” of Baidu’s !!
If you install the toolbar, and visit a certain page that Baidu has not crawled yet, then Baidu may build an index for the page without letting Baiduspider to crawl it – because you have done it for Baidu.
This is pure evil, maybe the evilest thing ever that a search engine can do…
A message to kids in front of this blog: stay away from Baidu Toolbar or any toolbar from search engines, just in case.
Maybe what I have found here is just an exception – I will keep paying attention to this and post updates on this blog.
Related posts:
Date: May 27th | Topic: Baidu Search Engine Optimization | Author: Ryan

I think that IE does something like that itself doesn’t it? http://searchengineland.com/google-bing-is-cheating-copying-our-search-results-62914
Also, I’ve noticed that Google will index some sites I create before I link to them. Even basic Drupal installs with no content are sometimes indexed right away. I use Chrome to setup such sites.
Hi Tait,
I am quite sure what Bing Toolbar may do according to the article is different from the situation described in my post.
According to the article from Search Engine Land, Bing Toolbar may collect user information, which may be further used to affect the rankings of websites on Bing. But what Baidu does with its toolbar, according to my observation, is much more complicated – it directly sends user information back to Baidu’s server, which is used to build an index of the page.
This sounds pure evil to me. I have asked around to see if my friends have seen this before, and they will get back to me soon.
As for the super quick indexing of Drupal and other open source CMS, most of those CMS have a default ping module, which can inform search engines whenever something is created on your site.