Friday, November 24, 2006

Ultra-scale Information Management: no place for rigid standards

This year's ER conference (International Conference on Conceptual Modeling) was hold on Tucson, Arizona, from Nov. 6 to Nov. 9. This year's ER is special since it is the 25th anniversary. Hence the organizers have prepared a few special events. To me, however, the best experience about ER 2006 was listening to a great talk given by Scott Renner from the MITRE Corporation.

Scott's talk is about "Community Semantics For Ultra-Scale Information Management." Here is the abstract.

The U.S. Department of Defense (DoD) presents an instance of an ultra-scale information management problem: thousands of information systems, millions of users, billions of dollars for procurement and operations. Military organizations are often viewed as the ultimate in rigid hierarchical control. In fact, authority over users and developers is widely distributed, and centralized control is quite difficult – or even impossible, as many of the DoD core functions involve an extended enterprise that includes completely independent entities, such as allied military forces, for-profit corporations, and non-governmental organizations. For this reason, information management within the DoD must take place in an environment of limited autonomy, one in which influence and negotiation are as necessary as top-down direction and control.

This presentation examines the DoD’s information management problems in the context of its transformation to network-centric warfare The key tenent of NCW holds that “seamless” information sharing leads to increased combat power. We examine several implications of the net-centric transformation and show how each depends upon shared semantic understanding within communities of interest. Opportunities for research and for commercial tool development in the area of conceptual modeling will be apparent as we go along.
The promise of successful information sharing is "when the right information is provided to the right people at the right time and place so that they can make the right decisions" [1]. To protect these five rights (right information, right people, right time, right place, and right decision), the DoD practice has proved that the top-down standardization approach can be partial success, but overall failure for ultra-scale information management systems. When the scale of an information system gets to be big, single vocabulary simply does not work any more, even if it is for a very much rigidly organized army environment.

Straightforwardly, this conclusion leads to another prediction. To the scale of World Wide Web, where (1) the number of web users is much greater than the number of soldiers, (2) the entire domain is much more complicated and significantly greater than the domain of military, and (3) there is no enforced power over the Web, absolutely there is no chance of success if we plan to design any standard for global information sharing, such as, the Semantic Web.

Information is more than data. Scott presented an interesting point: information is about data and how this data is understood. When data itself is objective, understanding the meanings of data can be varied with respect to different people. Varied understanding of meanings thus may lead to complete different decisions based on, however, the same data. Hence it is necessary to separate the two different types of information representations. What is the data is different from what is the data for.

Let's look at Semantic Web again. In order to build a real Semantic Web, data in the Semantic Web must be existing, accessible, visible, and understandable. Existing means data values or/and data descriptions must have been created. Accessible means created data presentation must be deliverable to its correct destination by legal requests. Visible means data representation can be watched and identified by humans. Understandable means data representation can be correctly and unambiguously identified by machines.

Ostensibly, one issue that is overlooked by the current Semantic Web research is the relationship between visible and understandable. Many current Semantic Web practices assume what machines understand must be what everybody expects; or on the other way they assume that a small group's vision of a domain can be a standard description of the domain for machines to perform. As what the DoD project has demonstrated, the previous assumptions are impractical in the real world ultra-scale applications, even if they are for the army scenario.

Semantic Web researchers must start to learn from Web 2.0 practices. A public domain agreement cannot be enforced in general. The enforcement model might be executable for a small domain and for a limited number of domain participants. But the model can never be scale to large size such as the entire Web. This difficulty of scalability lays not only inside the theory of conceptual modeling, which no doubt is a hard problem, but also in the intrinsic human expectation of freedom, which makes the difficult problem be essentially unsolvable.

[1] Scott Renner, Net-Centric Information Management, 8th Int. C2 Research and Technology Symposium, McLean VA, 2005.

Referenced resources:

Wednesday, November 01, 2006

Next Generation Web

What is the next generation of World Wide Web? Is it Semantic Web or is it something else?

This is my view: the next generation Web will be the coupling of Web 2.0 and Semantic Web. This thought is observed from the growing-up of humans. As a metaphor, we may compare the current World Wide Web (Web 1.0) to a baby. And indeed Web 1.0 is a baby because Web 1.0 pages only display information without directly communication with readers and without sufficient machine-understanding. In similar, babies only care of their own interest without thinking about others and without thinking of whether their message could be understood by the others.

When human babies grow up, they start to develop themselves simultanously in two ways. First, they learn to communicate to the other people, especially to the other children. They begin to talk to each other and improve their own knowledge through collective intelligence among groups. This is exactly the philosophy of Web 2.0.

On the other hand, human babies begin to learn from books. After they go to school, they start learning "standard" and "formal" specifications of world facts, which then are understandable by the general public. This is thus the philosophy of Semantic Web.

During the growing up of human children, the two processes interact each other. A proper balance between social activities and textbook learning is a key for children's individual development.

This child-growing scenario is a model for the evolution of World Wide Web. Nowadays, any overemphasis on either Web 2.0 or Semantic Web is unhealthy for the Web evolution. We have to properly combine both the aspects to achieve a well-functioned next generation Web.

Referenced resources: