Sunday, September 20, 2009

My Impression of Twine 2.0

Radar Networks is about to release Twine 2.0. The CEO, Nova Spivack, had made a presentation of the coming T2. I include the video in the following. It takes less than 5 minutes and is worth of watching.



From the beginning, Twine is a service that I constantly study. I had written the first impression, the second impression, and Twine went public that shared my analysis of the service before its beta, in the beginning of its invitation-only beta, and when it went public, respectively. I would like to continue this tradition for the news of the T2 release.

In short, the T2 moves Twine closer to the realm of Web search in contrast to Web bookmarking, based on Nova's talk in the released video.

Although there is natural connection between bookmarking and search, the distinction between semantic bookmarking and semantic search is significant. Usually, through bookmarking we briefly store the Web pages in which we are interested. Then by search we may retrieve the particular Web page of interest out of the pool of bookmarked pages. But semantic bookmarking asks for the organization of the semantics only within a self-defined knowledge network, i.e., basically with respect to every individual user only. Semantic search, however, demands the organization of the semantics cross the varied knowledge networks, i.e., across the varied perspective of knowledge organization over multiple users. This distinction dramatically increases the degree of complexity of the problem. I am truly impressed by the new announcement. At the same time, however, I really feel doubt of how much Twine 2.0 could solve the problem.

I feel that Twine missed something critical in its progress. In the following I will try to explain my viewpoint of what it might have missed.

Within Twine, there are actually three layers of semantic representations, which I would call them the layer of the twines, the layer of the individual knowledge networks, and the layer of the overall knowledge network. As the names suggest, the layer of the overall knowledge network represents the entire knowledge stored in Twine.com, the layer of the individual knowledge networks is composed by individual users' personal knowledge networks, and the layer of the twines is composed by all the individual twines defined in Twine.com.

Until now, Twine 1.0 pays most of its attention to the layer of the twines. Based on Nova's video talk, Twine 2.0 will mostly dedicate to the layer of the overall knowledge network. The missing part is thus the layer of the individual knowledge networks.

Back to my first impression of Twine, I had highly praised Twine's effort on developing the personal knowledge networks. They should be the foundation of Twine in contrast to the twines. Although it sounds ironically, it is indeed reasonable because the twines are nothing but the links among the individual knowledge networks. It is indeed the individual knowledge networks, instead of the twines, that are truly what the users are interested.

Unfortunately, it seems that Twine jumps back and forth between the layer of the twines and the layer of the overall knowledge network. But it forgets the crucial one that is the real foundation on which a best-performing Twine must rely.

On the other hand, it is much more feasible for Twine to develop a good business model if it does semantic search rather than semantic bookmarking. I guess that this is the real motivation underneath.

I am still look forward to the service. However, I feel sorry that it chose to walk the most difficult way to go ahead rather than choosing the other less aggressive but more stable (and promising) ways of approaching.

Monday, September 14, 2009

Outliers

“人之贤不肖譬如鼠矣,所在自处耳!” (李斯 (Li Si), 280 B.C. - 208 B.C.) [English Translation: Whether a man is noble or ignoble is as if rats live here or there. It is where it lives that determines the fate.]

Finally I finished reading Malcolm Gladwell's Outliers. The book is fabulous in its broadness and depth as well as the writing. In the book, Gladwell argued that the outliers (exceptional winners) succeed because of the environment they grow more than their born genius. To be successful requires only certain degree of goodness (in contrast to absolute superior) in IQ. Really successful stories, however, heavily depend on the luckiness of the individual growing up environment.

Li Si, the Prime Minister of the King of Qin and later First Emperor of China (Qin Shi Huang), once was a minor official taking care of barns in Chu (another kingdom in the meantime China). One day Li Si observed that the rats in the outhouse were dirty, hungry, and scared of any tiny unusual circumstance. At the same time, the rats in the barnhouse were clean, well fed, and much easier to adopt external changes in circumstance. Li Si then asked himself: were these rats born to be so different from each other? To get the answer, Li Si put some barnhouse rats outside and caught a few outhouse rats and moved them inside the barnhouse. After a few days, Li Si found that the original barnhouse rats became dirty, hungry, and scared of any tiny unusual circumstance, while the original outhouse rats became clean, well fed, and started to be easy to adopt unusual external changes. It was then Li Si spoke the sentence we quoted in the beginning of the post. By saying so, Li Si quit the job, left Chu and went to Qin. Eventually he became one of the most well-known Prime Ministers in the multi-thousand-year-long history of China.

Both Li Si and Malcolm Gladwell have emphasized the importance of the growing-up context to one's success. Few people are really incapable of being outliers. Still few, however, truly become outliers. The reason is not due to that the outliers are exceptionally smarter. It is only because the outliers happen to having the right context for their growing up in order to become exceptionally good.

The difference between Li Si and Malcolm Gladwell, however, is also worth of thinking. The former one emphasized that one could always invent a proper context to become outlier. The latter one, instead, emphasized that one should be ready to adopt the right alternate context to be successful. This distinction sets apart the two in their accomplishment. The former one became one of the greatest Prime Minister ever who helped unite China the first time in history. The latter one, on the other hand, become an exceptional writer who is well-known of being able to settle the common rules out of the variety of multiple culture.

Malcolm Gladwell told that one can never be exceptional without the right context. Li Si, by contrast, told that it is always possible to invent the right context for oneself when there is no right context ready for him by nature. Only by taking the two advices interactively, it is a balanced life for any person (not necessarily have to be a well-agreed outlier) so that he would neither be too proud nor be too timid about his accomplishment.

Sunday, September 13, 2009

Gravitation, the Web, and Wikipedia

Gravitation

Gravitation is a natural phenomenon by which objects with mass attract one another. It is one of the most fundamental restrictions applied to every object in the universe. A universe without gravitation would diverge in random. According to the General Theory of Relativity, gravitation along with space and time together define the geometry of the universe. Philosophically, space allows the universe to contain things, time allows the things in the universe to grow and evolve, and gravitation makes the universe be a reasonable system (the fundamental rules can be established since gravity forces mass converging).

World Wide Web

World Wide Web is a man-made universe in which data is mass. Every object in the real universe is composed by mass. Every object in the Web is composed by data. In similar, the Web as universe also has space, time, and gravitation together defining its fundamental geometry so that the Web can contain things, the things contained in the Web can evolve, and all things in the Web constitute a reasonable system.

Space in the Web is the unlimited expanse in which everything in data is located, which is similar to that space in the real universe is the unlimited expanse in which everything in mass is located. The entire volume of the space in the Web equals the overall capacity of the memory and hard disk in the Web computers. The size of the space in the Web constantly expands when we continuously add more machines into Internet.

Time in the Web is inherited from our real universe. Through this dimension, and this dimension only, the man-made virtual universe is connected to the real universe. I feel astonished by this discovery myself. Basically it tells that the Web could be totally lost (inaccessible) if time would stop. The derivation is, however, reasonable since it always takes a few time (though it could be very short period) for us to access the information stored in the Web through computation. When time stops, this type of computation the Web relies cannot be performed. Hence probably we might conclude the Web indeed resides in the dimension of Time.

Gravitation in the Web is certainly not due to the gravity in the real universe. As we have discussed, the decisive function of gravitation is to bring things toward each other. In the real world, the gravity plays the role of pulling mass together. In the web, semantics plays the role of pulling data together. Gravitation in the Web thus is semantics. Without semantics, data in the Web will fundamentally diverge. With semantics, however, data in the Web converges essentially. Semantics is also the fundamental force that makes the Web be reasonable.

Wikipedia

The rise and popularization of Wikipedia is a phenomenon. But why Wikipedia? Certainly there are many reasons. Here I would like to take a look at the phenomenon in the way of Web Science using the gravitation in the Web we just discussed.

Unquestionably, Web 2.0 has reshaped the Web to become a social platform. A representative character of Web 2.0 is the prevalence of user generated content (UGC) due to all kinds of the online social activities. The variety of UGCs makes the Web more and more exciting. But there is a problem.

In order to make the UGC production be faster and more efficiently, most of the UGC is not self-explained as if the Web pages produced in the pre-2.0 age. By contrast, UGC heavily relies on the external references to settle the common ground of mutual communication. For example:

Party A: The UGC in site A is laid out ugly.
Party B: We use UGC to provide geographical information.


Did the two parties talk about the same UGC? (Actually they did not. The first party talked about User Generated Content, while the second party talked about Universal Geographic Code.)

Such a problem was generally not a problem in the pre-2.0 Web, when there were very few demands on mutual online communication between the Web content publishers and the Web readers. The majority of the webmasters thus had enough time and be professional enough to make the Web content be self-contained. That is, the key terms were always unambiguously defined to avoid potential misunderstanding.

The rise of Web 2.0 broke the scheme. Suddenly the demand of producing UGC becomes tremendous. Most of the UGC producers are either non-professional in the domain of content they are going to produce or they do not have time or patience to settle the unambiguous ground for the messages. In order to maintain such a new scheme, however, the Web demands a commonly shared place of semantic grounding for people to reference. This is thus some intrinsic reason behind the rise of Wikipedia; Wikipedia happened to match the demand on time.

By using the model of Web gravitation we can summarize all the discussion of Wikipedia till now in a fairly concise but illuminating way. The Wikipedia phenomenon tells that in the age of Web 2.0 the gravity in the Web has started the process of data solidification. After the Big Bang gravity started to pull mass together and solidify it to produce the stars and planets. In the Web, the process is similar. The various domain-specific Web sites are the planets while the sites that engage the formalization of semantics like Wikipedia are the stars around which the planets circulate. Despite all the sites are built by humans, it is actually the gravitation in the Web (semantics) that intuitively guides the construction of all these sites. Data in closely related semantics moves to each other and new sites emerge.

The excitement of Web evolution just starts.



Special thanks to Eric Goldman for sharing me his work on the Wikipedia study. The initial thought of the post was made during the email discussion with him in discussing the future of Wikipedia. Eric's latest article, "Wikipedia’s Labor Squeeze and its Consequences," is an excellent work and I recommend it to anybody who is interested in the research of Wikipedia as well as the research of the fate of social media.

Friday, September 11, 2009

ISWC 2009


I am going to attend ISWC 2009 (The 8th International Semantic Web Conference) this year at Washington, D.C. If you will come to the same event, I would like to meet you and have a chat. Please leave me a comment or drop me an email.

Monday, September 07, 2009

The real-time web in a nutshell for Web developers and researchers

The "real-time web" is more and more popularly mentioned in various discussion. For example, ReadWriteWeb recently posted a three-installment series about The Real-Time Web: A Primer. In this post, I would like to share some of my thoughts of this new jargon in a concise way. Primarily, the post explains the real-time web in the way of Web research, which is different from the ReadWriteWeb post that targets the regular non-professional readers.

1) The real-time web is a web of information produced in real time.

The statement tells two points: (a) the real-time web is a data web; and (b) the information in the real-time web is genuine.

The two points are equally important. First, unlike World Wide Web in general contains plenty of services and connective links as well as data, the real-time web is primarily a web of data. Until now, no specific web services are designed for the real-time web (and indeed it is unlikely a necessity anyway). The dominant majority of the web links in the real-time web is referential to the details of the data mentioned rather than being connective among varied semantics in the web.

Second, the information the real-time web produces is generally genuine. It thus means that many times the produced semantics among the real-time-web data is new (i.e. has never occurred before or could not be search elsewhere) in the Web. This implicit hint carries tremendous amount of value. For example, once we might know the start time of a semantics coined in the Web, it would significantly reduce the difficulty of semantic search in the Web.

2) The real-time web is built upon a network of instant messaging.

Not necessarily be. But in reality the real-time web has been developed as a network of instant messaging. Twitter plays a crucial role in this migration. It is Twitter that invented the 140-character threshold in the real-time data production. This invention ties the data production in the real-time web tightly to the technology of instant messaging since the latter favors the production of the former. By contrast, the real-time web could have been in very different ways if it was led by the other companies such as CNN (by which the real-time web could be a network of a greater chuck of data integrated with complex services but with timestamp).

3) The real-time web is a subset of World Wide Web.

The real-time web is not World Wide Web in the next stage. It is a just a newly emerged subset of the Web. It is especially important for the Web developers to recognize the distinction so that they might not faulty interpret the evolution of World Wide Web.

4) The real-time web is a web of heavily overloaded information, full of duplicated data.

The percentage of data (as well as semantics/meaning) duplication is significantly greater than the rest parts of the Web. Very often the same data (or the same semantics) repeats itself extremely frequently within a short period of time frame in the real-time web. Hence any attempt of consuming the real-time-web data must be carefully thought to handle this unusual environment, which is quite different from handling the other Web data.

5) The real-time web is a web of uncooked information.

The real-time web shows the instinct human consciousness versus that the rest of the Web shows human memory. Again, this distinction implies the varied data mining technologies required for handling data in the real-time web.

6) The real-time web is not a new form of communication.

I disagree to the argument that the real-time web is a new form of communication. The argument not only incorrectly expresses the essence of the real-time web but also misleads the readers from the proper use and implementation of the real-time web.

The real-time web is not a form of interpersonal communication; the instant messaging is. The real-time web is a platform of instant messages. Why is the distinction critical? The two views of the real-time web demand significant differently on the issues of security and data integrity. By treating the real-time web be a form of communication, we need to focus on the restriction of accessing personal information. By contrast, by treating the real-time web be a platform of instant messages, we need to pay more attention to guarantee the freedom of information broadcasting. Try to mix the two fairly contradictory purposes could only lead to the unnecessary complexity of developing the real-time web.

The correct attitude is that (a) we need to have a public and free real-time web, and (b) we may need to invent better forms of private communication within the real-time web.

7) The real-time-web information decays.

Unlike the information stored in the regular Web, the information in the real-time web decays significantly faster. When a piece of information decays, it is meaningful to be used no longer. The fact implies that we need to invent a channel that allows the transportation of an information from its real-time web accessibility to the regular web accessibility to stop the process of decaying, if the information is truly worth of preserving. There is a lot more work to do in order to truly facilitate the real-time web.

Summary

Do not underestimate the production of the real-time web. Do not overestimate the value of the real-time web. Take a different thought of the real-time web.