Play with GREP

Posted by Ryan on 2011-04-14

GREP is another geeky SEO tool like the Lynx Text Browser that was talked about in my previous post. Ian Lurie wrote a post exactly about the same thing, but anyway, I would like to have a post on GREP of my own.

This is not going to be a real post though, and will be more like an incomplete grep user guide for SEOs:

1. combine server logs:

cat 1.log 2.log > target.log

The “cat” command is ultimately powerful in combining your server logs. If you keep periodical backups of all your server logs, you may need to use this command to combine some of them for further data analysis.

2. extract certain information from server logs, for example, let’s get all the lines about Googlebot:

grep “www.google.com/bot.html” 1.log > googlebot.log

You can also use this to get all the 404 errors from the server log:

grep “404” 1.log > 404.log

want to get the information about multiple bots? See below:

egrep “msnbot | slurp | googlebot” 1.log > bot.log

3. get certain elements from a line with the awk command:

grep “www.google.com/bot.html” 1.log |awk ‘{print $4 “\t” $7}’ >googlebot.log

I usually use the line above to extract the date and the url Googlebot crawled.

Be creative yourself and show off what you do with server logs for SEO.

Related posts:

baidu seo guide

Stay up to date