Baidu’s New Crawler – You!

Posted by Ryan on 2011-05-27

First I would like to clarify – this post is not about Baiduspider 2.0, which I already wrote about a while ago, but something much more fun.

In the SEO world, I think there is a basic fact that everybody agrees – no crawling, no indexing.

This is not complicated at all – if a crawler doesn’t crawl your page, then the search engine doesn’t know what your page is, so it doesn’t build an index for the page.

But recently I have observed something interesting about Baidu:

I found a page on my client’s website was successfully indexed by Baidu, but surprisingly, there was not a corresponding record of crawling in the server log.

In the first place I was struck with a headful of question marks, maybe more than that. Then I dig deeper into the server log, seeking information about the particular URL, but only got the single one record as follows:

GET /jzd/2011-5-21/stock-news-19893/ – 80 – Mozilla/4.0+(compatible;+MSIE+6.0;+SV1;+QQPinyin+730;+BaiduGame)

This record may seem confusing, but I am going to explain it for you: this record tells us the URL was only visited by a normal visitor, who used a IE 6.0 with Baidu Toolbar plus the Baidu Game widget installed.

What does this mean?

This means any human who uses Baidu Toolbar can be a “crawler” of Baidu’s !!

If you install the toolbar, and visit a certain page that Baidu has not crawled yet, then Baidu may build an index for the page without letting Baiduspider to crawl it – because you have done it for Baidu.

This is pure evil, maybe the evilest thing ever that a search engine can do…

A message to kids in front of this blog: stay away from Baidu Toolbar or any toolbar from search engines, just in case.

Maybe what I have found here is just an exception – I will keep paying attention to this and post updates on this blog.

