What is a good search engine?
To answer this question, we have to look at what a search engine is designed to do. A search engine’s task is really simple – helping users find the information they are looking for over the Internet.
So we can see, to be labeled as good, a search engine has to at least be able to do three things:
1. Understanding the demand of users, which is represented by users’ search queries and other related user behavior data;
2. Well organizing the resources search engines can supply (web pages it has crawled and indexed);
3. Matching the demand and supply with its algorithms.
As covered in a previous post of mine, today’s commercial search engines have mostly do a good job in terms of traditional algorithms like TF-IDF, BM25, HITS, HILLTOP, Page Rank, etc. What can really make them stand out of the crowd and leave others behind is user data ming, in some way, like reading users’ mind. (Isn’t that amazing?)
How do search engines recognize the demand of users?
The only gate that connects search engines and their users are the search box. The query a user inputs in the search box, in most cases, is the only piece of information that search engines can get to learn about the users’ demand.
But this can be really challenging. For example, look at the 2 queries below:
Both the queries have large search volume, but apparently, the demands of users behind these 2 queries are very different. Someone who searches for “chicken” has a higher chance of looking for chicken recipes or a takeout than articles about the movie Chicken Run. While most of those who search for Avatar are looking for ticket information about the movie Avatar or videos and trailers about this movie.
Search engines have many ways of identifying users’ demand, like how a user edits its input and the clickstream data.
a) How a user edits its input.
This is easy to understand. If you input a query in the search box and don’t find what you need, you would probably change your query and give another try.
For example, I want to order some chicken over the Internet and google “chicken” but find more information on the SERP is about chicken recipes. So I change my query into “chicken takeout”.
b) Clickstream data.
Which web page is preferred by users? The clickstream data can tell. If you find what you need on the SERP, you click on it, and if not, you ignore it. Just this simple.
All the information is collected by search engines to identify users’ need.
Your thoughts? Don’t hesitate to comment, 🙂