Thinking Space: April 2007

Sunday, April 29, 2007

Progress of World Wide Web: Evolution or Intelligent Design (revised, second draft)

Evolution or intelligent design? This is a grand question since the publication of the Origin of Species. Now this question moves to the realm of World Wide Web. Will the web evolve forward by its internal laws or will the progress of WWW be decided by the intelligent design of leading web scientists?

One thing is certain. The origin of WWW was by intelligent design. Furthermore, every single web technology was, is, and will continuously be designed by humans. It is unlikely that on some day the web may create new technologies totally by itself without humans' supervision So is there anything wrong in our question?

Our question is not about which party (humans or the web) is going to create web technologies. In contrast, it asks which party is going to decide the fate of new web technologies, i.e., which technologies can survive and which ones wane. If the fate is decided by some objective laws of WWW, the web is in evolution since it matches the view of Darwinian natural selection. Otherwise, if the fate of new web technologies will continously be decided by the preferences of small groups of web elites (as it used to be), the web is on the hand of intelligent design.

I believe in web evolution. Though the web once was dominated by intelligent design, such a time period had already ended. In order to explain the reason, we can take a look at general proceeding patterns of more progressive events.

Many progressive events have two distinct periods in their lives. One is the initiative period, and the other period is afterwards. Due to the significant rate of chaos, the principles dominating in the initiative period is often different from the principles dominating afterwards. As a typical example, the Big Bang theory tells that many natural laws generally guiding our universe at present were not effectively applied during the initiative of the universe. Furthermore, even the famous evolutionary theory of species presented by Charles Darwin was only an abstraction of the observations about the natural phenomena long after the real origin of species. Many researchers (even if they are Darwinians) have agreed that it needs modifications (and thus there are already several at present) to explain the evolutionary events close to the origin of this world.

During its initiative period, a progressive event generates a greater and greater momentum on its further advancement. For the events such as the progress of universe and nature, the generated momentum eventually becomes so great that they become insensitive to the external forces. Therefore, it begins the respective evolutionary laws.

This similar progressive pattern is applicable to the World Wide Web. At the beginning of World Wide Web, there were few web users. Hence individual innovations could be compatively easier to deviate the advance of WWW. This was the initiative period of WWW.

When the web continuously grew and eventually it was engaged by hundreds of millions of web users who contribute their data and services on the web, it became harder and harder for individual persons or organizations to lead the progress. Based on the general Law of Transformation of Quantity into Quality, when the number of web contributors was over some threshold, the quality of WWW changed. It would no long be controled by intelligent designs. On the contrary, it is intelligent designs that must obey the intrinsic evolutionary laws of the WWW. Otherwise, new innovations would not be adopted to the public, at least at the meantime.

The emergence of Web 2.0 and the slow realization of Semantic Web shows that this threshold has already been reached and passed. As we know, W3C is the most important leading organization of World Wide Web. Many of its guided intelligent designs had greatly impacted the formation of the current web. Beginning at the late 90s, W3C initiated a new project that was aimed to a future of World Wide Web, which was well known as the Semantic Web. Due to its exciting descriptions of the future (see this great article on Scientific American) and the reputation of W3C, very soon this Semantic Web project was engaged with hundreds of most talented web researchers all over the world. After these several years, however, this Semantic Web paradigm is still on the state of lab experiments. What is the reason? Certainly, the problem must not be about its vision. The vision of Semantic Web is great and unquestionable. The real problem is, however, because the realization of the Semantic Web is beyond the timetable of web evolution. The web may had already passed its initiative period before the proposal of Semantic Web. Then it has already started progressing on its own timetable rather than controlled by intelligent designs. And this is the key point about the slow adoption of Semantic Web to the public.

We can get the same conclusion on the other side, which is the emergence of Web 2.0. To a suprise of many web researchers, especially the devoting Semantic Web researchers, the quick adoption of Web 2.0 is an ironic fact. The number of devoting Web 2.0 "researchers" is quite short-numbered. Many of the Web 2.0 pioneers are programmers, journalists, publishers, and all kinds of people you name but few of them are leading "scientists" of World Wide Web. By this group of sort of "less-professional" and loosely cooperated reseachers, Web 2.0 becomes a fashion in just a few year (almost the same length of years when Semantic Web was struggling).

How could Semantic Web lose to Web 2.0 in competition? Is it because Semantic Web technologies are more complicated? Maybe, but this was not the crucial reason. Both Semantic Web and Web 2.0 began with simple methods and they could have equally attracted normal web users and grass-roots web developers. With better cooperation, more leading scientists, and the influential W3C, Semantic Web surely should have beaten Web 2.0 to the corner. It should be Semantic Web, instead of Web 2.0, that becomes the current phenomenon. The only reason that can well explain this dramatic hype of Web 2.0 is that: the web itself had made the choice. WWW has its own timetable that is guided by the web evolution laws. On its own schedule, Web 2.0 was a right innovation at the right time, while Semantic Web was not (it was beyond the time).

Web evolution laws matter. This is the most important lesson we should have learned from this Web 2.0 hype (and the slow adoption of Semantic Web).

Saturday, April 21, 2007

Degree of Separation on Web 2.0

What is the degree of separation of World Wide Web? Albert-Laszlo Barabasi had presented a 19-degree of separation on the web according to his best-selling book---Linked. This study, though remarkable, was based on a traditional web structure, in which links were rigidly hardcoded by webmasters. This scene of hardcoded linkes is closer to the leftmost regular network in the following figure because the link-specifiers must have pre-requisite knowledge about the existence of destination pages. Therefore, these hardcoded links are mostly linked to the neighbors on the basis of single-direction acquaintance.

Beginning with the Web 2.0, however, this picture changes. One important difference between Web 2.0 and the traditional web (which we may harmlessly address as Web 1.0) is the prevalence of human-specified tags. These tags, however, may have dramatically shortened the distance of arbitrary two web pages by linking them together without pre-acquainting. On Web 2.0, two web pages have a significant greater chance than before to be linked by a distance of only 2 by sharing a common tag without the need of knowing the existence of each other beforehand. Therefore, this scene becomes closer to the middle small-world network on the figure above. According to the Watts-Strogatz model, by adding a few random links into a regular network, we may signficantly reduce the diameter of the network. It thus means the further decreasing of the degree of separation on the web.

Beyond Web 2.0 and with the emergence of semantic web, the degree of separation on the web is going to be more and more close to the degree of separation among humans in the real world, while the latter one is also decreasing by the prevalance of the WWW. Traditionally, the degree of separation among humans is often regarded as 6 based on the famous theory of six degree of separation. Recently Thomas Friedman declared in his book "The World is Flat" that the relational distance between arbitrary persons was shortened when the world got flattened. Eventually, anyone who share common interest can become acquaintances to each other disregarding their physical distance in a flattened world. The web pages in a flattened world will become closely linked to each other as long as their human masters are acquaintances. Hence the degree of separation among web pages will basically equal to the degree of separation among real humans, which will be greater than 2 but less than 6 in a flattened world.

Thursday, April 19, 2007

We and Machine

As it was reported, Tim O'Reilly opened the keynote phase of the Web 2.0 Expo describing the Web 2.0 philosophy: "It's about building the global computing network and harnessing all the collective intelligence of all the people who are connected…. We are talking about persistent computing in which we are becoming part of a great machine."

I see more and more people starting to agree a point that I also raised in my article of web evolution, i.e., we are cloning certain degree of our consciousness into machines.

But we are not becoming part of a machine. By contrast, we are machinizing our mind and let the machinized mind compose a society which is the World Wide Web. The Web is a society while computing on the web is the aggregation of individual capabilities. Isn't it a better metaphor than watching the entire web to be a single machine?

Wednesday, April 18, 2007

New web battle is announced

(updated Dec. 21, 2007)

Google enters deeper and deeper into the realm of Microsoft. Yesterday at Web 2.0 Expo, Eric Schmidt was interviewed by John Battelle. During the interview, Eric announced a new launched presentation feature for Docs & Spreadsheets. Together with the previously released Google Apps, Google is now right on the track to fight against Microsoft, another great battle since Google versus Yahoo. In common, in both of the battles Google is the challenger.

The battle between Google and Yahoo was thought as a search-engine war; and many current observers watch the coming battle between Google and Microsoft as a word-processing war. These beliefs are, however, nearsighted. In fact, both of the battles are consequences of web evolution. They are about one issue: whether we want to leverage producing new quality resources or to supplement methods of operating old quality resources. Let's take the Google-Yahoo battle as the example to explain this viewpoint. In this battle, Google represented the leveraging of new-quality resources and Yahoo represented the supplementing of old quality resources.

One thing we have to clarify before approaching. The reason that Yahoo lose the battle to Google was not completely due to the PageRank algorithm. The PageRank algorithm had helped Google grow to be Yahoo-scale and it helped Google declare the war to Yahoo. But Yahoo was not defeated by this algorithm. By contrast, it was the rise of Web 2.0 that finally defeated Yahoo and prompted the rise of Google.

Web 2.0 provides a new collaborative solution to operate web resources. Consequently, it demands new quality resources that can further leverage its ability of collaboration. This is the pattern: collaborative methods ---> collaboration-friendly resources ---> more collaboration-friendly resources. Google caught this trend and led the revolution of producing new quality resources, such as GMail, Google Map, Google Earth, Google Apps, etc. These products not only provide old-style services (such as supporting emails or map search), but also enable users to collaboratively customize the services into their own frameworks. As the result, by using these products people produce more new-quality collaboration-friendly resources, which could be hardly produced (and thus used) on the pre-2.0 Web. At the meantime, Yahoo, however, still aimed to facilitate its old production line to prompt the production of old quality (generally collaboration-unfriendly) resources. It is this difference that caused the inevitable decline of Yahoo.

Will the history repeat itself again? Will Microsoft repeat the failure of Yahoo in this new battle? Maybe. This new battle is on the realm of word-processing, but indeed it is again about web evolution. They are about whether we will produce more new quality (Web-2.0 quality) documents or produce more old quality (Web-1.0 quality) documents. Web-2.0 quality documents can be easily shared, edited, and collaboratively edited on the web, as Google is projecting. Web-1.0 quality documents, however, are continuing being edited offline primarily. Though we may augment these Web-1.0 documents with the ability of online sharing, as Microsoft is proposing, collaborative editing is certainly more difficult to be implemented if the developers still want to preserve its fundamental functions on the offline side. Since Microsoft has already invested so much on its Office product and been succeeded so far, it is really difficult for Microsoft to start over again in another path that directly competes to one of its most successful and profitable products. As a result, Google will win the battle again.

But there is a problem Google needs to be aware. In yesterday's San Francisco Chronicle, Dan Fost wrote another article about this interview. In the article, there was an interesting paragraph.

The main problem with the Google approach, as Forrester Research analyst Josh Bernoff pointed out just before the speech, is "if you're on an airplane or anywhere else where you have no connection, it doesn't work. They need to make it like Outlook, where you can compose offline and then when you get online, it syncs up."

Online or offline, is it a critical issue? In fact, several others have already pointed out: after some point of time, there will be no offline unless someone particularly wants to be. Though apparently at this moment we still may encounter the offline problem, it is indeed not a critical issue for a long term.

By contrast, another side of this online or offline problem is much more critical than the problem itself. On the web, online means being public, while offline means being private. No matter how much we have advanced our Internet security protection techniques, offline is always the ultimate solution to protect privacy. In the other words, less online always means greater secure. Online applications will never completely replace offline applications since we always have something that is needed to be private, i.e. will not be shared with others. This is why Google must provide not only online resolutions, but also offline alternatives at the same time so as to maintain a private space for end-users.

Monday, April 16, 2007

Semantic Web and The World is Flat

I am now reading The World is Flat, one of the best-selling books written by Thomas Friedman. This book is way too long and I have read only half of it. But the main idea is already clear---the world has been flattened by new technologies. The rest of this paper is about how to face this new challenge, which is more controversial than the first half.

Debates about this book are intensive. For example, Matt Taibbi had made a strong critique about this book. Certainly, there is also much applause, such as this one from Tim O'Reilly.

In general, I believe in the thesis of this book, i.e., globalization is an unstoppable trend. Many old barriers are broken due to the revolution of new technologies, among which the most significant one is the World Wide Web. WWW connects people in the world to a new level. This is a level that our antecessors dreamed for centuries but never had been true until the prevalence of World Wide Web. Certainly that things like outsourcing, offshoring, and supply-chaining may still happen even without WWW. But they might never have been so widely understood and thus accelerated in the global scale without WWW. Therefore, WWW is not just a flattener. WWW is the most essential flattener because it delievers the knowledge of flattening to the global scale.

Semantic Web is a new stage towards a more flattening world. It is going to break the barrier of communication to the instance data level. On the W3C-proposed Semantic Web, the world is going to be so flat that even a child can dig a fact as deep as professional domain experts. So what will be the challenges in a very flattened world if a less trained child may do something as good as professional experts? Does it mean the education becomes less and less important? That people wouldn’t need an MBA degree or an Ivy League business education? The answer, however, is simply the opposite.

Flattening does not solve everything. In fact, it solves much less than we expect. Flattening only brings the same problems to a different level, which requires higher (instead of lower) level of knowledge. For example, before flattening, a manager needs to know how to divide his work to his workers. These workers are often local, and managers and workers are often well-known each other. Furthermore, because these workers are local, there are fewer choices this manager can make. Fewer choices also means, however, less work to the manager. On the contrary, in a flattened world, this manager knows that his work can be done piece by piece in a global scale. The manager has plenty of choices to select these pieces. The challenge is, however, which pieces may perform better than others in his framework. Moreover, it is nontrivial for a manager to integrate these scattered pieces together, which is the so-called work flow. These requirements demand much higher professional knowledge to the manager than before.

In abstraction, the process of flattening is the process of dividing tasks into tiny pieces so that it could be done by cheaper labors. This is why flattening leads to more and more outsourcing and offshoring because more and more previously complex tasks now can be done by multiple simpler tasks. But this classic divide-and-conquer method does not really solve the complexity of problems. It only leads the complexity to an upper level, or it only shifts the complexity to a different side. When it decreases the complexity of single task, it increases the complexity of integrating these simple resolutions to a complex resolution. In general, the total complexity of an original problem is neither decreased nor increased. It only matters where we load it.

Therefore, more and more outsourcing and offshoring means the requirements of more and more integrators, orchestrators, and explainers. This has been predicted by Thomas Friedman in his book.

The prevalence of Semantic Web will result in some fundamental changes on Computer Science education. When the barrier of data is eventually broken, we need fewer and fewer middle-class programmers that are trained by the current Computer Science education. Most of the end-point programming tasks will be so simple that they could be done by less professional programmers. In contrast, we need more and more high-level software architects that know how to integrate these low-level programs to be a uniform product that can solve some particular problems. This work requires knowledge on programming; it is for sure. But more of it is about art. These software architects will primarily be artists who understand the beauty of the world facts before they dig into the details of integration. They are the ones that Computer Science departments should train and produce.

Saturday, April 07, 2007

Semantic Web is closer to be real, isn't it or is it?

A recent post by technical evangelist Robert Scoble brought a small hype of semantic web again. His article was about the new achievement done by Radar Networks, which was founded by Nova Spivack. The following is quoted by Robert's post.

Basically Web pages will no longer be just pages, or posts. They’ll all be split up into little objects, stored in a database (a massive, scalable one at that) and then your words can be displayed in different ways. Imagine a really awesome search engine that could bring back much much more granular stuff than Google can today. Or, heck, imagine you could view my blog by posts with most inbound links.

Robert have expressed his excitement about watching the semantic web demo with Nova. Although I have not gotten the chance to experience it by myself, I can briefly feel what is going on and which technologies Radar Networks is now employing. This expressed scenario is a certain picture that current semantic web technology can support. Here at our research lab at BYU, we cooperate with some DERI researchers together and work on a project about enabling the semantic web to be real. In fact, the paradigm of our plan is close to the scenario that is expressed by Robert. So I feel familiar when I watch Robert's expressions.

The idea of semantic web has been discussed for years. The web is certainly moving towards holding more and more machine-processable semantics. But until now, the research of semantic web is still mostly limited in labs. What are the reasons?

According to our beliefs, the reason is not due to technical difficulties. Technical difficulties are severe problems, but not the deadly ones. The real difficulty is about who are going to take the control of semantic definitions. Are these definitions controlled by few elites or by the public? This is a grand question.

Many people have predicted the rise of "semantic Google." But I would say that there would never be a "semantic Google." Again, my argument is not due to the technical reasons. The problem is about who is going to be the owner of such a "semantic Google." If it is a US company, I must say sorry to it that countries such as China, Russia, or even European Union would ban it and build their own semantic giants because any major country in this world would not endure its "semantics" being controlled in the hand of another country. This type of threats is intolerant to any independent country.

In fact, we may even not need to raise this problem to the level of nations. Even within individual persons, no one would like to be forced agreeing on the semantics defined by another person. This semantic-definition problem is indeed the most crucial problem to the prevalence of the semantic web.

What is the solution? In fact, the success of Web 2.0 has shown a pragmatic resolution. We need to allow the public to define semantics by themselves. This is the only way that can promise the prevalence of semantic web. Any web user can apply his machine agent to understand the web based on his own understanding. This is what a pragmatic semantic web should deliver. On such a pragmatic semantic web, no "semantic Google" can exist due to the massive diversity of human understanding. A collaborative web search model will replace the current centralized search model. In fact, we have invented a new theory of collaborative web search on the semantic web, and hopefully it could be released soon.

In summary, Nova Spivack and his company Radar Networks have done great achievement on realizing the semantic web. We must congratulate them! Great work, Nova! On the other hand, however, unless they can show that their solution has properly solved the semantic definition problem, the age of semantic web is still not there yet.

Wednesday, April 04, 2007

Story of My Internship at DERI Innsbruck Last Summer

Today the CS department Homepage at BYU posted a story of my intership at DERI Innsbruck last summer. It was indeed a very pleasant experience. DERI Innsbruck is currently one of the best Semantic Web research labs in the world. During this period, I had wonderful working experiences with many DERI people, especially Martin Hepp, Ying Ding, Omair Shafiq, and Jan Henke. This article brings me many wonderful memories about DERI and Innsbruck. If some young Semantic Web researchers are looking for a place to do internship right now, DERI Innsbruck is definately a place they need to try.