Baiduspider, Googlebot and How Differently They Behave on Your Site

Posted by Ryan on 2011-02-23

As a search engine marketer, one of my favorite things about SEO is to study crawler behavior. It can be very much fun (at least to me) because when I do it, I think myself studying real insects and yeah, boys love insects!

(Never tried to dismember crawlers like I did to insects though…)

We know different search engines send out their own web crawlers – today we are going to look at Baiduspider and Googlebot, the two types of user agents sent out by Baidu and Google respectively.

How have I done it?

I have got the detailed crawler behavior data from a month’s server logs of my friend’s website. As covered in a previous post of mine Server Logs, My Precious, server logs are always of very great importance to SEO. It can be considered as a gold mine that holds almost all the important information that can not only reflect the strengths and weakness of your current SEO status, but also a lot more, such as visitor behavior and if your site has been attacked or cloned.

What I found:

Agents: Visits Total Time Spent on Site (hr) Pages Crawled %
Baiduspider 3741 2397.240 18973 13.938
Googlebot 356 620.226 28228 20.737

*The server logs where the data was accumulated belong to the period when the website was not optimized yet, so the data you conclude from your own server logs may reflect much better working efficiency of these crawlers.

The data can tell us a couple of interesting things:

  1. Baiduspider’s visiting frequency is much higher than Googlebot (in our case, more than 10 times higher);
  2. Baiduspider spends more time on the site (in our case, almost 4 times more);
  3. Baiduspider’s crawling efficiency is poor – with more time spent on my friend’s website, the total number of the pages it crawled is only nearly 2/3 of that of Googlebot;

Now dear SEO folks, think about what the hidden information is behind the facts listed above – give whatever assumptions you think reasonable, 🙂

When I first saw this, my conclusion was that Baiduspider crawled the same pages again and again on my friend’s website, while Googlebot managed to find more fresh content.

I guessed it was due to URL depth, and my further findings proved I was right.

In the server logs, I found, the longer the URL hierarchically is, the more severer the decrease of Baiduspider’s crawling. On my friend’s website – for the root directory, Baiduspider visited it 111 times in the month, while Googlebot only did it for 18 times, while for a third level directory, Baiduspider visited it only once but Googlebot went there 9 times.

Generally speaking, Baiduspider’s affection towards a flat website is much stronger than that of Googlebot, or we can say, Googlebot’s ability to dig deeper into a site is much better than that of Baiduspider.

Blah-free takeaway for non-geek readers:

Baidu favors flat websites more than Google does.

Related posts:

Stay up to date