Tuesday, May 08, 2007

Web Search, is Google the ultimate monster?

(Revised at Sept. 29, 2007)

If investing on web technologies is buying jewelry, web search technology is the most brilliant diamond on top of these jewelries. A recent post about Top 17 Search Innovations Outside Of Google at Read/WriteWeb attracted hundreds of click and dozens of comments again. Google or Post-Google, this question attracts eyeballs.

New companies may beat Google, but not by the Google way. At present, this claim represents not only innovation of search technologies, but also revolution of basic web search strategies. As I discussed in an earlier post, unlikely we can build another, more advanced "semantic Google" to beat the current Google. To defeat Google in real, the basic strategy of web search must experience a revolutionary change.

The current web search strategy can be summarized as the oracle-based web search. When web users search information, they look for oracles from the "Gods of the Web." These "Gods" are search engines, which are assumed knowing about the Web way more than us as normal persons know. Among these "Gods" the greatest one is Google. In the current WWW, Google is the "God." We beg for the oracles from it to access expected web resources. If there are no answers from Google, by convention we just believe that there are no answers to our question on the Web. This scenario typically shows how much we have trusted and relied on this "God." Although nearly all of us know that Google often makes mistakes and even Google can only search a comparatively small portion of all web resources, we indeed have barely better choices. To many companies, competing a better Google rank is a premier task. Isn't it a pity to make the life miserable because of some non-existing "God?"

This is not the first time in history we humans have this experience. In almost all the ancient countries, our ancestors had worshiped various non-existing Gods and begged for vague oracles from them. During the period of these dark days, formal education was not prevailed. Knowledge was primarily holden in the hand of few priests who were "servants" to the various Gods. These priests produced vague oracles using the names of these non-exiting Gods to control the mind of people. If even priests could not answer a question, normal people at the meantime had to believe that there were no answers to the question. Asking priests was the primary way to access knowledge of the unknown world. This scenario is exactly the reflection to the current stage of web search.

How did our ancestors get out of this darkness? The answer is one word --- education! The prevalence of formal education liberated humans out of the control of vague oracles. When knowledge education was prevailed, people found new ways to look for the unknown. Rather than to be new priests who generally know everything, the new knowledge education system educated normal persons to be specialists of various domains. Therefore, people no longer needed to look for vague oracles from priests. In contrast, they looked for much more clear and precise advices from particular domain specialists. They no longer laid the hopes on the shoulder of the man-made Gods. They started learning to help each other by sharing individual knowledge. This was the great Renaissance. Such a formal education and social system became the basis of our modern society.

The evolution of web search will follow a similar path. Sooner or later, the "Gods of the Web" will gradually step down from the stage. Though the algorithms about centralized web search are improving, the speed of knowledge accumulation is much faster than the improvement of the algorithms. This was the essential reason why in old days the priests could no longer pretended to be Gods. The set of knowledge was simply become too much to be holden by small groups. Educating everyone became the sole solution, and it did work.

What is an educated web? The educated web is the Semantic Web. To build this educated web, we need to prevail the cognition of eduction. On the Web, it means the cognition that we need to educate machines. The progress of constructing semantic web is not to build a system. No, we do not build a system; but we educate individuals and the network of these individuals automatically is an educated web.

The specialist-based search (i.e. a combination of vertical search and collaborative search) will gradually replace the oracle-based search. Engaged with more and more user-generated tags, the web is moving towards this direction. What is still lacking now is advocates of web education of machines. But we are certainly on the right track.

Final Address

Don't try to build another Tower of Babel. Our ancestors had tried once, and the attempt was failed. If the physical tower failed, could we succeed in building a mental Tower of Babel? I doubt it strongly. Constructing the Semantic Web is not to build a Tower of Babel. It is to simulate our modern human society. It is impossible to achieve "Semantic Google." But Semantic Web will still be available. At the day, however, there are no noble seats reserved for an oracle-announcer Google.

For readers who like to know more about this vision of web evolution, the Part 2 of the web evolution article is an old but coherent version of the view. A newer version is in this blog, the series of "A View of Web Evolution." (Follow the tag web evolution and you can get it.) Any comments are far more than welcomed.


Saravanan's post: Beat Google!

Sunday, May 06, 2007

Evolution of Web Links, another direction of thoughts

Danny Ayers had his most recent column at IEEE Internet Computing: Evolving the Link. In this article, Ayers summarized his vision of how web links evolve with the progress of World Wide Web. Agreeing with his vision, I, however, have some supplementary thoughts about the evolution of web links.

Before presenting my supplementary thoughts, I would like to briefly review what Danny presented about web link evolution. In general, Danny's vision followed a strict technical line. At the beginning, web link were anonymous connections that linked one web page to another. Using his example, "< href="http://creativecommons.org/licences/by/2.0/">cc by 2.0< /a>" produces an anonymous connection from the page that contains this specification to a particular web location: "http://creativecommons.org/licences/by/2.0/". Though each destination has its distinctive meaning, web links themselves mean nothing else except that the referenced web resources are ABOUT the local text. The meaning of ABOUT is, however, simply too rich to be properly distinguished.

To solve the problem, the "rel" attribute is designed so that it describes de relationship from the current document to the destination resource. For example, "< href="http://creativecommons.org/licences/by/2.0/" rel="license">cc by 2.0< /a>" shows the meaning of the link to be "license." Again, this solution still has its problem because the meaning of content inside the rel attribute is often not machine-processable.

In this column article, Danny presented that a potential solution to this last problem is to treat links as data. Thus, we map not only data but also links to proper RDF descriptions. With these mappings, machines can automatically interpret the meanings of not only resource nodes on the web, but also the links among these resource nodes. This vision of web links thus concluded the article.

Nevertheless I agree with Danny's vision, I feel something else also important to the evolution of web links but missed in discussion at this article. Besides semantic meanings, vulnerability is another essential aspect of web links. Traditionally handcrafted, anonymous web links are often vulnerable to individual prejudice. For example, I have produced a normal web link from this post to Danny's blog, which shows a relation of this post to Danny. Meanwhile, I can also subjectively remove this link (but not the content) so that there becomes no immediate link from this post to Danny, though indeed there should be such a link since the content is not changed. This simple example shows a basic problem about traditional links---they are vulnerable to individual prejudice.

Though to solve this vulnerability problem is not the intentional driving force of web link evolution, an important side effect of this evolution is the gradual invulnerability of web links. With the emergence of Web 2.0, more and more indirect web links are created based on common tags. For example, I have tagged this post with a keyword "Danny Ayers," while at the same time, Danny's blog is also tagged by the keyword "Danny Ayers." Therefore, even if I have removed a normal href-style direct web link in my post to Danny's blog, there is still an immediate link between this post to Danny's blog because we have shared a common tag. Due to the objectiveness of tagging, this type of web links becomes less vulnerable to individual prejudice than the previous purely handcrafted web links.

When the web evolve forward to the ideal semantic web, we can predict the creation of more and more objective links between web resources. These links are going to more and more based on common meanings (objectiveness) rather than individual preferences (subjectiveness). As Danny presented in his column, when links are shared as open data annotated by formal taxonomies, the existence of these links becomes less and less vulnerable to individual prejudice. It is the content itself that would decided the links.

This evolutionary aspect of web links is important to not only web links, but also the WWW in general. It means that the web is going to be weaved not only in more and more details, but also more and more objectively. Based on this conclusion, we can make several interesting predictions about the future web.

  1. Tags and annotations are going to be premier resources on web search. Due to their objectiveness (less vulnerable), the network composed by tags and annotations is going to be more stable than the network composed by handcrafted links. This fact may significantly help improve the efficiency of web search.

  2. The weight of tags and annotations are going to be more and more critical on ranking research results. The balance between these weights (the side of objectiveness) and link popularities (the side of subjectiveness) will be an essential issue on new web search ranking algorithms. Does it mean the dusk of the PageRank algorithm and thus the declining of Google? We are not sure yet. But at least this is not a positive news to Google.

  3. Tags and annotations are going to weave the web into close-related communities with distinct topics. As a result, vertical search engines may replace horizontal search engines becoming the basis of web search. At present, vertical search is relied on horizontal search and then provide more details on particular domains. In the future, horizontal search will be based on vertical search and then provide more details on cross-domain communication. This role switch on web search may significantly affect the structure of web industry, especially the web search businesses.

I have more discussions on web evolution in general in my article of Evolution of World Wide Web. The most recent post is the Part 2, Web Evolution Theory and The Next Stage. In this part, we studied several web evolution laws and composed them together to be a basis for predicting the evolutionary future of World Wide Web. Though these are only our viewpoints, we hope it brings some fresh air into the study of web evolution.

Wednesday, May 02, 2007

A fantastic map of a world of the web

Randall Munroe of xkcd.com has drawn a map of online communities as a world of the web. Nevertheless that I believe we are going to explore more land of unknown in this world of the web, this is a very creative artifact. I think I would like to have a T-Shirt about it.

Evolution of World Wide Web, Part 2, Theory and the Next Stage

I have uploaded the second part of the article---Evolution of World Wide Web on April 27. In this article we start with the discussion of several exciting web evolution laws. Then we apply these laws to predict the next stage on web evolution, which may be called the Web 3.0.

This article is still in progressing. Please leave your comments after you read it. I will update this post and the article regularly.