Sunday, January 21, 2007

Evolution of World Wide Web: the announcement of a new web article

Here is a new online article about web evolution.

URLs: (prelude) (part 1)

As we know, World Wide Web evolves. Especailly in recent years when the research of Semantic Web moves fast forward, the emergence of Web 2.0 brings a new hype of new-generation web applications. But what is behind all of these phenomena? This article is an attempt to explain them and from which we can predict the future evolution of the web.

In this article, I try to make an analogy between web evolution and the growing up of human generations. In fact, these two evolutionary orbits are amazingly similar to each other. By this analogy, we can successfully explain many of current detates. Such as, why Web 2.0 is a revolutionary new stage rather than a simple jargon, and why it is inappropriate to name Semantic Web as Web 3.0. Then based on these explanations, we can watch clearly how the web evolves forward.

In the meantime, I want to particular denote this article to the initiative of Web Science. I believe that this initiative is a landmark: World Wide Web has been an objective existence that is independent to the human society. World Wide Web has its intrinsic laws (which are focuses of Web Science research) that controls its evolution. Although we may think that humans control all these processes, indeed we do not, however. We humans invent the web and build the web. But after the web is built, it becomes an objective existence. More importantly, this existence has become so powerful and influential that it contains its own laws. This is what the initiative of Web Science tells us.

As its result, we researchers need to be aware that our duties are starting to be changed. Previously, the majority of our duty is to create rules to make the web work. With the mature of WWW, more and more duties of web research are going to become discovering the intrinsic laws on the web and how these pseudo-natural laws may guide us effectively consume web resources. As an interesting observation, the Web 2.0 practices tell us that the web is self-growing when we consume its resources. The reason is that the process of consuming web resources is a process of producing new web resources. This observation leads to a certain conclusion: unless we stop using the web, we cannot stop the evolution of the web; and this process of web evolution is not controlled by any small group of people but by the entire human behavoirs. Therefore, the web evolution itself becomes a natural process, or at least it is pseduo-natural since humans need to participate.

"The thing that hath been, it is that which shall be; and that which is done is that which shall be done: and there is no new thing under the sun. Is there any thing whereof it may be said, See, this is new? it hath been already of old time, which was before us." (quoted from Bible, Ecclesiastes 1:9-10)

The web is a clone of our society. From its beginning, people use it as an extension of our life. On the web we clone ourselves, not physically but virtually. We leave all information about ourselves on the web, what we believe, what we care of, what we are interested in, what we are living with, who we love, and who we dislike. Web records everything, through which it can rebuild us by our knowledge, our interest, our friendship, and everything else except physically cloning us. It is true that the current web has not been so powerful yet. But it is the future. And this article is discussing this future.

This article is planned to contain three parts: past and present of WWW, future in dream, and inventing the future. Currently, I have finished the first draft of Part 1. Part 2 and Part 3 will be online soon. Sincerely I welcome any comments and discussion about this issue. For comments, critiques and discussion, please drop me an email to "" (Web 1.0 method) or leave your comment on my blog here at "" (Web 2.0 method).

URLs: (prelude) (part 1)

Thank you very much for reading this post and reading my article.


Vlad Chernyshov said...

Hi, Yihong!
Great article! Maybe even the best article about semweb!I can't wait you to continue it!

Well, your analogy with humans growing process seems interesting to me. Maybe it's suitable not only for Web, but for all technologies (genesis process).

I've met with Semantic Web not long ago and I have questions I'd like to discuss.

1. Why semweb emerging MUCH slower than Web itself?

2. If one big ultimate ontology is impossible, then how I can do something very complex on semweb?
And how the agents will discover new ones? From centralized servers??? That's not a good idea.

3. How can we avoid incorrectness of the ontologies?

4. And who will made all this HUGE collection of ontologies?

5. If an user's personal agent doing some bandwith-hungry task, and if there is around 1 billion agents, how can the Web bear this?

6. That is your web search model for semweb? This is interesting.

7. And, god damned, why almost all semweb tools written in Java!? :)

Recently I've been discussing semweb with friend of mine. And he told me: "Vlad, semweb, as I understood, is a Knowledge Base (KB). But this is enormous amount of work to place all human knowledge into ontologies. Heh! We need to take all scientists on the planet and put them together on some island! And, I hope, for 10 years the job will be done!"

I responded him that we have Wikipedia, whitch is considered impossible till recently.

Now we see that more and more vendors begin to offer APIs. And we have a lot of mashups on them. But these APIs aren't built on the same standarts. So it's very difficult now to create a system that could to use them all in a simple way...And of course I have to register an API key. It's a huge limitation for using APIs on the fly.

Wanna hear you comments!)


Yihong Ding said...

Hi Vlad,

Thank you for your questions. I think all of them are very good ones. I cannot say that I know precise answers to any of them. But I can try to clarify what I believe about the realization of the Semantic Web.

1. The reason of the slow realization of Semantic Web is mainly due to the requirement of huge collections of formal knowledge. Certainly there are many other reasons. But to me, I believe this is the most crucial one. Unlike for example the current web, everybody can upload anything without the need of concerning whether it is understandable by machines. The requirement of machine processibility is the most difficult thing to build from scratch.

2. I totally agree that it is not a good idea to build centralized servers to hold ontologies. Technically, surely we will face many difficulties. But the real problem is from the political side. We cannot allow any organizations, whether or not they are governments, companies, non-profit organizations, or anything else, to get the capability of controlling how humans can think, can present, and can be understood. By the way, this is the philosophy that I believe "Semantic Google" will not replace Google. And if there is a "Semantic Google", it will definately not be so influential and successful as the current Google.

So the question is, how to build these ontologies? I believe that it would be collaborative way, and (probably more important) a progressive process. Everyone can build ontologies based on their interest. Whether these ontologies would be accepted, however, may totally depends on how good these ontologies are and how popular these creators are among web users.

3. Philosophically, there is NO ontology that is incorrect. I guess you can understand this point. Most of the time, the term "correct" is a subjective term. Incorrectness is only something that we do not agree. But if someone else accepts this "unpopular" view, it then becomes correct to these people. Semantic Web must allow this varity. So there must be popular and unpopular ontologies. But incorrect ontologies? I guess there are no incorrect ontologies.

4. All web users.

5. This is a very good question. But you may have overlooked one issue. For example, if there is 100 people who want to pass the same door at the same time, this is a big problem. But if these 100 people run to different directions simultanously in a building, there may be no crowd at all. Similarly, everyone of us has our own interest. There are overlaps. But in general, we are running to varied directions. So this problem may not be so serious as you have thought.

6. I will discuss my search model on the Semantic Web in Part 2. ;-)

7. I don't know whether it must be Java. But Java is certainly a very good language for web programming. And there are many other languages too, such as AJAX, Rube, PHP, etc.

As I discussed in my article, the realization of Semantic Web will be a process as human's growing up. It means that this web cannot be built even if we do have taken all scientists on the planet and put them together on some island! Can a child suddenly become a well-educated adult by forcing learning all books in the world? No, it takes time. And time is important to not only build knowledge, but allow children really understanding knowledge. I forsee that knowledge understanding is not a process that can be done in a minute (like many of current computer programs). But more or less it likes the process of training a machine agent. We need patience and time to input knowledge and allow machines understanding them. What this process needs is not scientists. What this process really needs is time and patience by every normal web users.

Vlad Chernyshov said...

Hi, Yihong!

Thank you very much for your answers.

Yes, I agree that we need huge collections of formal knowledge. I hope that this problem will be solved technically by emerging tools like (Nova Spivack's company) They promise to show their first product in 2007.

RDF and OWL are much more complex than HTML. And we need development environments for semweb languages as we have for HTML and other web 1.0, web 2.0 languages (Dreamweaver, etc.)
And maybe I missed that RDF and OWL went a recommendations only in 2004...

I totally agree that ontology (or whatever it called) must be developed in a collaborative way like Wikipedia.
And I also agree that it will be a progressive process. From simpliest (making possible to make general tasks) to complex.

You said that whether these ontologies would be accepted depends on how they "good" and popular...But if I'm running automatic agent I don't care which ontology it uses.

About 'Semantic Google' and web search itself: I believe that crawling isn't a good way for making search engines. It seems to me like something unnatiral. It's ugly concept. Wasting time, bandwith, computational resources to crawl every page whether it was updated or not. It's unefficient.
And of course treating web resources as collection of pages became obsolete. Even now is it useless to crawl a site totally written with AJAX.

But for now the only alternative to crawling I see is some sort of pinging (like for ex.). Do you have any ideas?

About incorrectness. Well...I mean how about trustness? :) Of course there can't be "incorrect ontology", because every interpretation of human knowledge depends on some agreements between people.

Well, maybe bandwith isn't so serious problem as I think. But consider following case: we have some extremely exciting and useful web-service. It is used by 100 millions users. Then service can be invoked a millions times per second. So we have DoS attack!)))
I have real example for it. Flickr offers API to access photos. I want to have Flickr photos on my plazma screen on the wall every 3 seconds. And also wants a huge crowd. Then for 100 millions users we have approximately 30 000 000 requests per second!

Yes, realization of Semantic Web takes a time. But it doesn't mean that we can't have immediate benefits from this process.
Now we need several exicting and useful use cases for semweb (or related technologies) to attract critical mass.

I'm thinking about Web as a programming platform, where web-service, programs, web sites, agents and other entities can collaborate, can be used for dynamical program creation on user's behalf.
The early stage of this is mashups. People get exciting about them. There are more than 1500 mashups on the web. I also wrote several.
But we need to integrate them all to achieve real power.

And this is my goal. I'm working on this.

So I can't wait for your Part 2! :)

Yihong Ding said...

Hi Vlad,

It is my pleasure to have this discussion with you. Things get to be more and more interesting.

Nova Spivack is a respective researcher and also a great pioneer on realizing the Semantic Web. I have also read from his article that he claimed to release some practical Semantic Web tools (especially practical semantic annotation tools) in 2007. But I doubt that it will be companied with a large collection of ontologies. Ontology creation is still a hard problem. Actually, it is not just hard. The real problem is that no ones really believe that their created ontologies are useful at the present. In reality, the majority of normal web users even do not have any idea what ontologies are. How can we expect they are going to read, understand, and then use these created ontologies? Therefore, there is no real motivation on developing ontologies for the real world. The only explanation is that the time of Semantic Web has not been ready yet.

Similar to that people tend to adopt popular tags to label their articles on Web 2.0, popular ontologies definately can get more advocators on Semantic Web. Users do care about which ontologies they have adopted to annotate web pages. In fact, if they annotate their web pages using popular ontologies, their annotated pages may become easier to be understood by the other machines. Otherwise, it means the need of some extra overhead of ontology matching. No matter how good we may have developed ontology matching technologies, this is an overhead. So it will affect the popularity of these annotated web pages. This is the reason I said that whether an ontology is popular or not does matter.

For web search, no matter whether it is semantic search or not, crawling is still a necessary step. But I totally agree with you that we must reduce the percentage of crawling to improve the performance of web search. Divide-and-conquer can be a good solution. If we can find a way to divide the entire search task to small tasks, then definately we can have better performance. The problem is, however, how to divide. And this is one issue I am going to discuss in Part 2.

Sorry that I am too busy to release it soon. I only write this article on my free time. And there are too many ideas I need to well compose them together to be a coherent theory. It is not an easy work. But I will let you know as soon as I finish it.

By the way, thank you very much for pointing me the site It is very interesting. Also, I am glad to know you have many experiences on implementing mashup. Although I know its theory, I have not done it myself. Maybe I can ask for your help to do some mashup myself some time later. ;-)

Vlad Chernyshov said...

Hi Yihong,

It's my pleasure too.

Normal web user shouldn't know about ontologies at all, I think.

There is no real motivation on developing ontologies because whey are useless without software, that uses ontologies.

By the way, the creator of, Frédérick Giasson (I'm corresponding with him now), is developing Music Ontology now.

He also wrote about it in his blog

Also there is a site, which allow to export music data in RDF (till recently as I understand, now they don't....)

Hmm....that's an idea! Let's share semweb-related links! I will post all semweb links I knew on my blog.

Yeah, I know crawling is necessary for now. By the way, ping services, such as pingthesemanticweb, can a little improve performance allowing to know when to crawl. But it's a half-decision.

Today I've received news about STAIR project (STanford Artificial Intelligence Robot) at Stanford University. Their goal is to build a homehelper robot.

I thought what using semweb technologies can extremely improve STAIR's intelligence. And this is suitable for any robot in future. To be able to perform really useful tasks for humans robot MUST have some kind of semantic techs inside (ontologies, for ex.). So I wrote a letter to Assistant Professor Andrew Ng, a STAIR's lead.

Here it is:

"Mr. Ng,

I'm very excited with your STAIR robot!
I'm not a specialist in AI, but after viewing this page I thought that Semantic Web technologies could help your team to make your robot more intelligent. I understand that you commonly consentrated on mechanical and recognition aspects rather then logic reasoning and decision making, but I hope you'll find this information useful.

Wish you success,

Vlad Chernyshov
Novosibirsk State Technical University "

And Andrew was very kind to answer me:

"Hello Vlad,

Thanks for pointing this out to me [...] but ultimately we decided to work on
low-level control and lower-level recognition etc. first, and then only
build up over time to higher-level reasoning and decision making. In
particular, I think one of the mistakes of the field of AI has often been
to work on high-level reasoning before the low-level problem has been
solved, which resulted in high-level reasoning systems that're very
difficult to apply to anything. (E.g., it's of little use to get STRIPS
to reason about whether to pick up the coffee or to fetch the mail first,
when you don't yet have a robot that can physically pick up a coffee cup.)

But over the long term, I think semantic web will play a large role in
getting STAIR to reason, though!"

And one more piece of news.

Recently I've found guys developed first world open source mobile platform. They called it OpenMoko. It's fully based on Linux. And that's mean not only one can run native Linux programs on it, but also a great amount of other cool stuff.

Linux on mobile phone (well, on a communicator to be clear) !!! WOW! I want it!)

This this open source Linux platform it will be much easier to implement various techs and apps (particularly semweb!) .

For ex., it is possible now to easy implement CC/PP (description of device capabilities and user preferences. This is often referred to as a device's delivery context and can be used to guide the adaptation of content presented to that device. This is a W3C activity) as Tim Berners-Lee envisioned on his great article here.

And OpenMoko team placed a FOAF profile on their blog! I've found it interesting. This is a new site, and it uses FOAF. I think this trend will grow.

You know, Yihong, I've thought that there is so many interesting places on the Web! Even too many to read whem all!

In spite of the fact that feeds aggragators and services such as Netvibes help people very much, they only can narrow all the news on the Web. But there can be uninteresting things for a user even within one feed.

I want to receive only feeds that I'm interested in. Because it takes too much time to get rid of them all.

:) I think that you're overestimating my mashup experiences, but it will be pleasure for me to collaborate with you.

Vlad said...

Hi, Yihong!
I´m a math student in the UNAM of México City. In love your work and I´m thinkig how put an mathematic structure to your theory, but I believe i need more info to cross the theory with the topic. Can You help me whit some resources to read more about? Thanks

Yihong Ding said...


thank you and I am glad to help. Actually, I am thinking of how to mathematically formulate this Web evolution model too. I am glad to have you work together with me.

Please let me know what kind of resources you are looking for.


Anonymous said...

I think, you will come to the correct decision.