Tuesday, October 03, 2006

Role of URI for Machine Understanding (Brainstorming with Tim Berners-Lee, issue 1)

(revised August 1st, 2008)

Well, where should I start? Beginning with a brainstorming by Tim's blog might be a good idea. Without his invention of World Wide Web, this blog communication could not have happened.

In his blog, Tim first mentioned his opinions about URI. Based on my understanding, a fundamental issue about machine-understanding is associating every Web data to an URI. Two identical URIs would simply mean two identical real-world objects. This philosophy is the cornerstone of the current machine-understanding.

Human-understanding begins also from a similar fundamental agreement. When a foreigner tries to communicate to a native, they talk by using fingers pointing to the same items. By speaking in different terms, gradually they understand each other. These fingers to humans are the URIs to machines.

Unless explicitly specified otherwhere, varied URIs by default mean differently (like two fingers pointing to different places). This rule is probably the most fundamental one in "machine-understanding." Otherwise the generic Web object identification problem could be very complicated.

Everyone deserves a URI! This is a brilliant point. One valuable but full of challenge request in the current Web development is human identification. When we type in a friend's name into current search engines, such as Google, we often get many search results of people who have the same name. If every Web user has a unique URI, which becomes his unique Web ID, it would be much easier for search engines to filter the results.

A question is, however, where a personal ID URI should point. The URI might point to a homepage, or a picture, or a short personal description, or a string of numbers such as social security number, or there are many other options. Any of these options could work; but every one of them has its limitation. For example, a string of numbers is easy to store and convenient for machine processing; but at the same time they are easy to be stolen and forged. On the other hand, a biography is semantically rich, harder to be forged, and easier to check its integrity. But it is much more time consuming to author biographies for every person and who is authorized to charge these biographies.

Tim suggested the use of FOAF RDF documents to be unique person indentifications. FOAF defines well-designed and easy-to-process attributes about individual persons. A problem is, however, that its RDF content is customized for sharing friends rather than identifying individuals. Is it really suitable for individual identification? This is an interesting problem that is worth of exploring in the future.

Referenced resources:

No comments: