Sunday, September 13, 2009

Gravitation, the Web, and Wikipedia


Gravitation is a natural phenomenon by which objects with mass attract one another. It is one of the most fundamental restrictions applied to every object in the universe. A universe without gravitation would diverge in random. According to the General Theory of Relativity, gravitation along with space and time together define the geometry of the universe. Philosophically, space allows the universe to contain things, time allows the things in the universe to grow and evolve, and gravitation makes the universe be a reasonable system (the fundamental rules can be established since gravity forces mass converging).

World Wide Web

World Wide Web is a man-made universe in which data is mass. Every object in the real universe is composed by mass. Every object in the Web is composed by data. In similar, the Web as universe also has space, time, and gravitation together defining its fundamental geometry so that the Web can contain things, the things contained in the Web can evolve, and all things in the Web constitute a reasonable system.

Space in the Web is the unlimited expanse in which everything in data is located, which is similar to that space in the real universe is the unlimited expanse in which everything in mass is located. The entire volume of the space in the Web equals the overall capacity of the memory and hard disk in the Web computers. The size of the space in the Web constantly expands when we continuously add more machines into Internet.

Time in the Web is inherited from our real universe. Through this dimension, and this dimension only, the man-made virtual universe is connected to the real universe. I feel astonished by this discovery myself. Basically it tells that the Web could be totally lost (inaccessible) if time would stop. The derivation is, however, reasonable since it always takes a few time (though it could be very short period) for us to access the information stored in the Web through computation. When time stops, this type of computation the Web relies cannot be performed. Hence probably we might conclude the Web indeed resides in the dimension of Time.

Gravitation in the Web is certainly not due to the gravity in the real universe. As we have discussed, the decisive function of gravitation is to bring things toward each other. In the real world, the gravity plays the role of pulling mass together. In the web, semantics plays the role of pulling data together. Gravitation in the Web thus is semantics. Without semantics, data in the Web will fundamentally diverge. With semantics, however, data in the Web converges essentially. Semantics is also the fundamental force that makes the Web be reasonable.


The rise and popularization of Wikipedia is a phenomenon. But why Wikipedia? Certainly there are many reasons. Here I would like to take a look at the phenomenon in the way of Web Science using the gravitation in the Web we just discussed.

Unquestionably, Web 2.0 has reshaped the Web to become a social platform. A representative character of Web 2.0 is the prevalence of user generated content (UGC) due to all kinds of the online social activities. The variety of UGCs makes the Web more and more exciting. But there is a problem.

In order to make the UGC production be faster and more efficiently, most of the UGC is not self-explained as if the Web pages produced in the pre-2.0 age. By contrast, UGC heavily relies on the external references to settle the common ground of mutual communication. For example:

Party A: The UGC in site A is laid out ugly.
Party B: We use UGC to provide geographical information.

Did the two parties talk about the same UGC? (Actually they did not. The first party talked about User Generated Content, while the second party talked about Universal Geographic Code.)

Such a problem was generally not a problem in the pre-2.0 Web, when there were very few demands on mutual online communication between the Web content publishers and the Web readers. The majority of the webmasters thus had enough time and be professional enough to make the Web content be self-contained. That is, the key terms were always unambiguously defined to avoid potential misunderstanding.

The rise of Web 2.0 broke the scheme. Suddenly the demand of producing UGC becomes tremendous. Most of the UGC producers are either non-professional in the domain of content they are going to produce or they do not have time or patience to settle the unambiguous ground for the messages. In order to maintain such a new scheme, however, the Web demands a commonly shared place of semantic grounding for people to reference. This is thus some intrinsic reason behind the rise of Wikipedia; Wikipedia happened to match the demand on time.

By using the model of Web gravitation we can summarize all the discussion of Wikipedia till now in a fairly concise but illuminating way. The Wikipedia phenomenon tells that in the age of Web 2.0 the gravity in the Web has started the process of data solidification. After the Big Bang gravity started to pull mass together and solidify it to produce the stars and planets. In the Web, the process is similar. The various domain-specific Web sites are the planets while the sites that engage the formalization of semantics like Wikipedia are the stars around which the planets circulate. Despite all the sites are built by humans, it is actually the gravitation in the Web (semantics) that intuitively guides the construction of all these sites. Data in closely related semantics moves to each other and new sites emerge.

The excitement of Web evolution just starts.

Special thanks to Eric Goldman for sharing me his work on the Wikipedia study. The initial thought of the post was made during the email discussion with him in discussing the future of Wikipedia. Eric's latest article, "Wikipedia’s Labor Squeeze and its Consequences," is an excellent work and I recommend it to anybody who is interested in the research of Wikipedia as well as the research of the fate of social media.


Kingsley Idehen said...


Another great post!

If you haven't already, please see my old post titled: Metcalfe, Einstein, and Linked Data :-)

Yihong Ding said...


thank you.

I had read your post before. But this time when I read it again, I must say that I learned more from your post.

I agree to you that there must exist some fundamental function of equivalence similar to E=mc2 in the Web. What is it? It is hard to tell at present.

About the form of "Energy" in the Web, here is my thought. In our real universe, mass is used for evaluating the amount of an object and energy is used for measuring the capability of production of an object. In similar, in the Web data can be used for evaluating the amount of an object and computational output of data can be used for measuring the capability of production of an object.

Recall myself in my model of Web evolution, I have used the term "quality of Web resource" for measuring the "energy" of a Web object and the term "quantity of Web resource" for measuring the "mass" of a Web object. The entire goal of Web evolution is to let Web resource evolve to contain greater quality per unit quantity.

May we apply the E=mc2 style equation to mathematically compute the conversion? I do not know at present. But your insight is truly very illuminating.