Tuesday, August 12, 2008

The Age of Google (2): Open Sesame

(the previous installment: The Age of Google (1))

It is an old Arabic story. By accident, a poor Arabic young man named Ali Baba overheard a message spoken by a group of thieves---forty in total. From the message, he learned a tremendous amount of treasure hidden in a cave. The magic spell to open the cave was "Open Sesame." Ali Baba thus entered the cave with the secret spell and took some of the treasure home.

World Wide Web is the cave full of treasure. The search engine sites are the magic doors. The user-specified keywords are, however, the "open sesame."

Among all the magic doors, Google is the most marvelous one. Google has standardized the way of ranking online information. Before Google, we had diverse standards to rank the relevancy of information on the Web. Because of Google, the standards converged. This convergence is an important sign about the age of Google.

Subjective ranking vs. Objective ranking

In primary should ranking be subjective or objective? This was (and probably is still) a crucial debate in the realm of Web search. The issue is not only a technological argument but also a business decision.

Technologically, in the time before Google the performance of subjective ranking was generally incomparable to the performance of objective ranking. Yahoo was providing significantly better relevancy in its search results than the other search engines which performed objective ranking methods. Although due to the subjective policy Yahoo's execution expense was higher, its superior quality of search results made the cost worthy.

On the side of business execution, from the beginning Web search was coupled with the online advertising business model. Popularly, Web search engines were selling their first few related search results to certain advertisers; such a policy was once the basis of online advertisement. The subjective ranking search engines apparently executed the policy better than their objective competitors because the former ones selected relevancy subjectively anyway. There was nearly no extra cost for subjective search engines to embed the business model into their framework. On the contrary, the objective search engines might have to execute two policies in parallel to reach both the technological goal and the business goal. Therefore, the overhead of subjectively deploying advertisement reduced the advantage of objective ranking in its low cost of execution.

Google cleverly resolved the dilemma on the side of objective ranking. As the result, objective ranking declared the victory over subjective ranking, at least until the present. (Be note that now the subjective search strategy is striking back. Represented by Mahalo, the subjective search policy is reclaiming its momentum. I will analyze this phenomenon in the following installment of this series when discussing the challenge Google faces.) Another consequence of the resolution is what we all know: the victory clinched Google's championship on Web search over the previous leader Yahoo.

What Google did was actually on two folds. One fold was at the technological side on which Google implemented a brand new objective ranking policy that significantly improved the performance. We will discuss this fold a little bit later in this post.

The other fold was the revision of the online advertising business model. We have known that the previous online advertising business model favored subjective search. Since Google's technology belonged to objective search, the company was trying to discover a new business model that was more compatible to the objective search. Finally, Google invented AdSense and its sister program AdWords. There have been many discussions about the two programs. In fact, at the next installment I will discuss the two programs again. At here, however, the discussion is solely on the impact of the two programs to the combat between the subjective search and the objective search.

AdSense and AdWords reshaped the online advertising business model from the subjective judgments made by the search engines to the objective decisions made by the run-time mapping between search queries and advertising words. Please be note that this change did not indeed have improved the performance of advertising in the sense of technology. It, however, brought two critical improvements for online advertising business: (1) it decreases the unit cost of deploying an advertising word, hence by spending the same amount of money advertisers may now ask for more advertising words associated to their advertisement; and (2) it is implicitly empowered by the effect of the long tail since objective search automatically associates any unpopular deployment of the advertising words to the advertisement.

In order to thoroughly exploring the advantage of objective ranking, Google made another great decision. Google abandoned mixing the search results with advertisers' product links. Google displays all search results in order purely by their objective ranks. By contrast, advertisers' links are laid separately in such as the side bar of the page of the related search results. By this revision of advertisement deployment, Google won the name of integrity and objectiveness on Web search. This fame has been a crucial part of the foundation for Google's business success.

Cluster weighting vs. Link popularity

Another debate of Web search is between ranking by cluster weighting and ranking by link popularity.

In short, ranking by cluster weighting is to classify documents based on the measurement between typed keywords and predefined clusters of information. The implementation of the methodology may be machine learning or the mathematical analysis of vector computation. Ranking by link popularity, however, is to measure the relevancy of documents based on how popular the document is linked in the Web. More popularly linked ones are assigned greater value of relevancy.

Certainly, as many of us know, Google is a supporter of the latter policy. The PageRank algorithm is a famous representative of the thought.

In theory, however, we should expect that cluster weighting must be superior to link popularity on ranking search results. Essentially, cluster weighting methodology directly looks for the content relevancy, while link popularity just indirectly reveals the favorite of relevancy through human activities. In the other words, the truth itself is always more correct than people's vote of the truth. If we can directly reveal the truth, we do not need to rely on the secondary votes to guess the truth.

The problem is, however, whether we are truly able to compute the truth or even if we could, how much the computation would cost. This is where the problem of ranking by cluster weighting is.

The beauty of PageRank is to greatly avoid the complexity of vector computation between search keywords and content keywords. The mentioned computation is very costly in both of run-time execution and off-time optimization requirement. By investing the same amount of money to store and analyze the topological structure of World Wide Web, Google believes that it might gain more reward than investing on cluster weighting computation. Indeed, Google proves itself.

In essence, public voting is probably the most cost-efficient way to approach the truth when the truth is unknown. Public voting does not always reveals the truth. But if the truth is too expensive to be revealed, public voting is what most of the people accept and it generally reveals something that is close to the truth. This is the same philosophy of democracy. So actually Google was excising Web 2.0 implicitly even before Web 2.0. In person, I believe this is an important reason that Google eventually became an early leader of Web 2.0.

But ranking by cluster weighting is not dying. The policy is actually preparing its strong fightback now. Semantic search is the modern mutation of this traditional policy. As many people suggest and expect, semantic search (if it be realized) would certainly outperform the current search executed by Google. But how to reduce the execution cost remains to be a grand challenge. We will discuss more about semantic search in the following installments.

Open Sesame

In summary, Google's "open sesame" is to explicitly execute low-cost objective ranking over implicit, free, subjective ranking (link popularity) performed by regular Web users. This is a model putting "objectiveness" on top of "subjectiveness of the crowd."

However, the business success of this "open sesame" was still not enough for Google to be an age. Google might have been just another successful company if nothing else happened. There is another crucial reason that eventually pushed Google from a successful company to a legend. At the next installment we are going to discuss it.

(The next installment: Web 2.0)

Referenced resources:

1 comment:

auriol said...

may I be authorized to put your picture of Ali Baba on my website :
http://auriol.free.fr ?