The search function of a website plays an important role for the site’s user experience and information architecture. It helps your users get access to what they are interested in on your website more effectively.
It is usually used like Google: a user comes to your site, inputs a query in the search box, presses the button, and get a search result page…
Everything seems okay.
Wait… a user can search whatever he/she wants using your site’s search box, with zero control from you, right?
The problem is that the search function can be a user-generated content source that you have overlooked and little control over.
What if I search “www.spam-site.com” on your site?
I will PROBABLY get a result page that:
- has content like “sorry, nothing found for www.spam-site.com”
- has a valid URL, maybe something like www.yousite.com/?s=www.spammysite.com;
- gives a 200 server response;
You see what the problem is – now your site has a valid page with content that contains a link of a spam site, which can be visited by both human, and bots.
What if I search Spam? Now the link of the spam site even has anchor text!
What if I search “blah blah Spam blah blah? Great, now the page has richer content, with a link with anchor text pointing to a spam site… I can even make the “blah” part unique, better, longer and more relevant, with images, videos… as long as your site allows it. You know, Panda stuff.
Spam is not the most dangerous part of it. What’s worse, someone can also use it to harm your site’s reputation.
We know search engines don’t like certain stuff, such as adult content and illegal products.
What if I search “words-you-don’t-want-your-children-to-hear ” and “stuff-you-don’t-want-your-kids-to-try” on your site?
What if I create tens of thousands of pages like that and build links to them so they can be found by crawlers?
Generally, to prevent this, you need to inform search engine crawlers that those harmful result pages are invalid. Maybe you can let the “sorry nothing found for” page return 404 response, add, build a filter or stop-word list for your site’s search engine, use Google Custom Search instead, or similar methods.
Be creative :)
And by the way, I still haven’t come up with a proper name for it, any ideas?