Http status codes are a series of pre-defined three-digit integers that represent certain types of responses web servers send to requests.
For example, 200 means the server successfully returned the requested resource, like a web page, an image or a css file. 404 means the requested resource doesn’t exist, 403 means access to the request resource on the server is forbidden, and 500 means there is something wrong with the server, and 503 means the server is temporarily down.
You can find a complete list of HTTP response codes here on the W3C website.
If you are an SEO and don’t bother looking at all the status codes, Moz has set up a list specifically for SEOs here, which is more beautiful and easier to read.
Information above should be nothing strange to online marketers and web developers. However, how do search engine crawlers really handle these response codes? This was a myth I always wanted to know about for sure and would trade my iPad for.
Recently Baidu released an article giving basic explanations on their web-crawling system on the Baidu Webmaster Help site. The most interesting part of the article is they explained how Baiduspider handles 404, 503, 403 and 301:
If a URL gives 404, Baiduspider will mark the URL as invalid, and delete it from the database. What’s more, if Baiduspider finds the URL on other websites after it gets the 404 response, it will not crawl the URL either, for an unknown period of time.
If a URL that’s already in Baidu’s index gives 503, Baiduspider will not delete the URL from its index right away but request the URL a few times in future. If it’s a new URL, Baidu will not index it, but also try the URL a few times in future. If the URL keeps giving 503 responses, it will be marked as invalid, and deleted from the database.
How Baiduspider handles 403 is the same with 503. Baiduspider will still try accessing the URL a few times, and mark it as invalid if it keeps giving 403.
Baidu recommends using 301 when a website uses new URLs or domains, and also using Baidu Webmaster Tools to accelerate Baidu recognizing the new URLs and domains.
Get your own insights on how search engine crawlers handle HTTP responses? Observing interesting facts in web server logs? Sharing is awesome!