Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Explorer's Guide to the Semantic Web - Chapter 1: The Semantic Web

0.00/5 (No votes)
5 Jan 2005 1  
The Semantic Web.

passin_cover150.jpg

Author Thomas B. Passin
Title Explorer's Guide to the Semantic Web
Publisher Manning Publications Co.
Published June, 2004
ISBN 1932394206
Price US$ 39.95
Pages 304

The Semantic Web

So he went down to the agora, or marketplace, where there were a lot of unemployed philosophers�which means philosophers which were not thinking at that time.

Thought�in other words, philosophers can tell you millions of things that thought isn�t, and they can�t tell you what it is�and this bugs them!

�Severn Darden, Lecture on Metaphysics

In the beginning, there was no Web. The Web began as a concept of Tim Berners-Lee, who worked for CERN, the European organization for physics research. CERN�s technical staff urgently needed to share documents located on their many computers. Berners-Lee had previously built several systems to do that, and with this background, he conceived the World Wide Web. The design had a relatively simple technical basis, which helped the technology take hold and gain critical mass.

Berners-Lee wanted anyone to be able to put information on a computer and make that information accessible to anyone else, anywhere. He hoped that eventually, machines would also be able to use information on the Web. Ultimately, he thought, this would allow powerful and effective human-computer-human collaboration:

I have always imagined the information space as something to which everyone has immediate and intuitive access, and not just to browse but to create� Machines become capable of analyzing all the data on the Web�the content, links, and transactions between people and computers.

�when [the Semantic Web] does [emerge], the day-to-day mechanisms of trade, bureaucracy, and our daily lives will be handled by machines talking to machines, leaving people to provide the inspiration and intuition. (Berners- Lee 2000)

I find this vision inspiring, and the means to get there intriguing. The Semantic Web has, in a way, become almost a celebrity�Scientific American has even published an article on it (Berners-Lee, Hendler, and Lassila 2001)�although most people don�t know what it is, and although there really isn�t a Semantic Web yet. There are many different ideas of what it is, not just one. In this chapter, we examine a range of ideas about what the Semantic Web should be. Some of them may seem futuristic or impractical, but a great deal of work is going on in all the areas we�ll examine.

1.1 What is the Semantic Web?

The word semantic implies meaning or, as WordNet defines it, �of or relating to the study of meaning and changes of meaning�. For the Semantic Web, semantic indicates that the meaning of data on the Web can be discovered�not just by people, but also by computers. In contrast, most meaning on the Web today is inferred by people who read web pages and the labels of hyperlinks, and by other people who write specialized software to work with the data. The phrase the Semantic Web stands for a vision in which computers�software�as well as people can find, read, understand and use data over the World Wide Web to accomplish useful goals for users.

Of course, we already use software to accomplish things on the Web, but the distinction lies in the words we use. People surf the Web, buy things on web sites, work their way through search pages, read the labels on hyperlinks, and decide which links to follow. It would be much more efficient and less time-consuming if a person could launch a process that would then proceed on its own, perhaps checking with the person from time to time as the work progressed. The business of the Semantic Web is to bring such capabilities into widespread use.

In brief, the Semantic Web is supposed to make data located anywhere on the Web accessible and understandable, both to people and to machines. This is more a vision than a technology. In this book, we�ll explore the technologies that will play roles in bringing the vision to life.

As you might expect, there are many different ideas about what this general vision encompasses. An almost overwhelming number of different ideas exists about the supposed nature of the Semantic Web, and that�s the first lesson to learn: The Semantic Web is a fluid, evolving, informally defined concept rather than an integrated, working system. To give you a feel for this range of ideas, here are some representative quotations about the nature of the Semantic Web:

  • The machine-readable-data view � �The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.� (W3C 2003)
  • The intelligent agents view � �The aim of the Semantic Web is to make the present Web more machine-readable, in order to allow intelligent agents to retrieve and manipulate pertinent information.� (Cost et al 2001)
  • The distributed database view � �The Semantic Web concept is to do for data what HTML did for textual information systems: to provide sufficient flexibility to be able to represent all databases and logic rules to link them together to great added value.� (W3C 2000) �A simple description of the Semantic Web is that it is an attempt to do for machine processable data what the World Wide Web did for human readable documents. Namely, to transform information processing by providing a common way that data can be accessed, linked together and understood. To turn the Web from a large hyperlinked book into a large interlinked database.� (SWAD-E)
  • The automated infrastructure view � �In his recent Scientific American article, Berners-Lee argues that the Semantic Web is infrastructure and not an application. We couldn�t agree more.� (Tuttle et al 2001) �Therefore, the real problem is the lack of an easy automation framework in the current Web�� (Garcia and Delgado 2001)
  • The servant-of-humanity view � �The vision of the Semantic Web is to let computer software relieve us of much of the burden of locating resources on the Web that are relevant to our needs and extracting, integrating, and indexing the information contained within.� (Cranefield 2001) �The Semantic Web is a vision of the next-generation web, which enables web applications to automatically collect web documents from diverse sources, integrate and process information, and interoperate with other applications in order to execute sophisticated tasks for humans.� (Anutariya et al 2001)
  • The better-annotation view � �The idea of a �Semantic Web� [Berners-Lee 2001] supplies the (informal) web, as we know it, with annotations expressed in a machine-processable form and linked together.� (Euzenat 2001)
  • The improved-searching view � �Soon it will be possible to access Web resources by content rather than just by keywords.� (Anutariya et al 2001) �The main goal [of the technology described in the paper] is to build a structured index of the Web site.� (Desmontils and Jacquin 2001)
  • The web services view � �Increasingly, the Semantic Web will be called upon to provide access not just to static documents that collect useful information, but also to services that provide useful behavior.� (Klein and Bernstein 2001)

    �The Semantic Web promises to expand the services for the existing web by enabling software agents to automate procedures currently performed manually and by introducing new applications that are infeasible today.� (Tallis, Goldman, and Balzer 2001)

It�s clear that this notion of the Semantic Web covers a lot of ground, and perhaps no two people have quite the same idea about it. Still, several themes are expressed time and again:

Let�s look more closely at these themes.

1.1.1 Indexing and retrieving information

Everyone wrestles with how to find information. Libraries have card catalogs, and now many have electronic indexes. Search engines are vital components of the Web. Yet at some point, everyone has been frustrated and annoyed by how hard it is to locate things, especially when you aren�t sure what to ask for. To find information, a Semantic Web approach would expect to go beyond keyword and alphabetical indexes to let users search by concepts and categories.

The Web part brings in a persistent theme, in which information is distributed� spread throughout the Web�rather than concentrated in a few repositories. Most systems that use concept identification to retrieve information maintain their own concept hierarchies and attempt to identify those concepts in the documents they index. Sometimes, concepts in a document collection are identified automatically, with varying success. To go further requires that documents be able to declare their own vocabularies and sets of concepts and to identify where they�re used.

1.1.2 Meta data

Card catalogs and electronic indexes contain data about the works that are cataloged and indexed. Data about other data is often called meta data. For example, the ISBN number and the author�s name are meta data about a novel. The data types describing the data in a database also fall into the category of meta data. It�s even possible to have meta meta data (a statement about the origin of a piece of meta data could be considered to be meta data about meta data, or meta meta data).

In one sense, meta data is still data; the distinction lies in the intended use of the data and in the subject of the meta data. It�s meta data that will be used for searches and for discovery of information. Annotation can also be thought of as meta data.

1.1.3 Annotation

In the world of physical documents (such as books), people write margin notes and comments, they underline and highlight passages, they staple new items to reports, and they add thoughts and ideas to those of the original authors. Markup languages like XML should, you�d think, be able to add such annotations; but today, it�s hard to do this in a simple way that lets other people share your annotations and lets you move your annotations to other applications and computers. Wiki-style web sites attempt to let many people comment on and modify web pages, but this process covers only a little of what people would like to do.

Because annotations should be shareable, and because the meaning of different types of annotations should be widely understood, support for extensive annotation capabilities is often seen as part of the Semantic Web.

1.1.4 A huge interoperable database

Today, it�s common to get data from a database over the Web. These databases are generally separate and not easily used as merged data sources, and a great deal of data exists outside of databases. This part of the Semantic Web vision sees ways to unify the description and retrieval of stored data, allowing much of the Web to be considered part of a large virtual database.

Consider a sports researcher looking for baseball data. There are various online baseball databases: The Major League Baseball web site is but one of many. But if our researcher wants to find performance statistics for Stan Musial, whose career lasted from the 1940s to the 1960s, she can�t get data for the whole period in a mutually compatible format. At least for baseball statistics, there is some common agreement on the definitions of the most important statistics, so that a batting average is always computed the same way�this is more than can be said for most separate collections of data.

If the Web functioned as an interoperable database, the researcher could get the data from all the important sites, and the researcher�s software would be able to either display all the data together or automatically combine data from, say, the Major League Baseball site and the Baseball Almanac.

1.1.5 Machine retrieval of data

This part of the vision focuses on automatic acquisition of data. This means that, a piece of software, in pursuit of its assignment, determines what data it needs and where and how to get it, and then goes out and gets the data. Using the baseball example from the previous section, suppose our researcher has to find the right web pages, load them, and then figure out a way to get the data and organize it. This is hard to do and often takes a lot of time. Under the Semantic Web, the data format and its manner of access would be described in a way that would allow the researcher�s computer to get and use the data automatically.

1.1.6 Services

A service is a behavior that provides a benefit. Examples include making reservations, arranging schedules, providing prices, placing orders, and so forth. Think of ordering, say, a perishable item like flowers or food. Once you�ve selected a product to buy, you have to make sure that its delivery will fit into your schedule. The price, buying conditions, delivery options, and your schedule can all be thought of as services that must be activated and coordinated. In the �Semantic Web as web services� view, all these services would publish machine-readable data that would allow a computer to do all the activation and coordination for you.

1.1.7 Discovery

To use services, you (and especially your software) must be able to find them, discover what they do, and learn how to invoke them. This is the realm of discovery of services. The most obvious approach would be to create directories of services with standard access methods. The services would be described in standard terms, and information about how to access them and the available information would be encoded in standard ways.

Consider an analogy with a physical library. Most libraries in the United States use either the Dewey Decimal System or the Library of Congress method to catalog their books. After using the card catalog or its electronic version, a person becomes familiar with the classifications and learns how to find books on the shelves. Here, the standard access methods are the familiar classification system and the physical arrangement of books in the library.

A more advanced approach would be to send out discovery requests based on the services required, and for candidate services to describe their capabilities in such a way that the would-be user could deduce their capabilities and instigate a conversation to find any missing or uncertain information. Returning to the library example, this would be like getting an experienced research librarian to tell you which reference books to look at and how to understand the information in them.

1.1.8 Intelligent agents

An agent is someone or something that acts on your behalf. A software agent would act in a somewhat autonomous way, communicating with other software agents (which might be specialized) to discover services, products or information for you. For instance, one of those specialized agents might know how to purchase airline tickets and make reservations. Another agent might perform the required services, passing the results back to your own agent, which would notify you of the outcome. It�s clear that a network of interacting agents would have to be able to describe its goals using established vocabularies, to discover services and information resources, and to use many of the capabilities described in the previous sections.

1.2 Two Semantic Web scenarios

To give you a feel for the way these areas might interact and how the Semantic Web could provide great value, here are two scenarios that were developed during the workshop �Research Challenges and Perspectives of the Semantic Web�.1 Both scenarios illustrate what might be called personal services. Of course, similar scenarios could be constructed for many other areas, such as business-to-business transactions. Note that the language is taken directly from the report without corrections for grammatical and spelling errors.

Scenario 1: A research assistant

During her stay at Honolulu, Clara ran into several interesting people with whom she exchanged vCards. When time to rest came in the evening, she had a look at her digital assistant summarizing the events of the day and recalling the events to come (and especially her keynote talk of the next day). The assistant popped up a note with a link to a vCard that reads: �This guy�s profile seems to match the position advertisement that Bill put on our intranet. Can I notify Bill�s assistant?�

Clara hit the �explain!� button. �I used his company directory for finding his DAML2 enhanced vita: he�s got the required skills as a statistician who led the data mining group of the database department at Montana U. for the requirement of a researcher who worked on machine learning.� Clara hit then the �evidence!� button. The assistant started displaying �I checked his affiliation with University of Montana, he is cited several times in their web pages: reasonably trusted; I checked his publication records from publishers� DAML sources and asked bill assistant a rating of the journals: highly trusted. More details?�

Clara had enough and let her assistant inform Bill�s.


1 This workshop was organized by the European Consortium in Informatics and Mathematics (ERCIM) for the European Union Future Emergent Technology program (EU-FET) and the US National Science Foundation (NSF). It was held in Sophia-Antipolis, France, in October 2001.

2 DARPA Agent Markup Language; see chapter 7.

Scenario 2: Negotiating a date

Bill�s and Peter�s assistants arranged a meeting in Paris, just before ISWC3 in Sardinia. Thanks to Peter�s assistant knowing he was vegetarian, they avoided a faux pas. Bill was surprised that Peter was able to cope with French (his assistant was not authorized to unveil that he married a woman from Qu�bec). Bill and Peter had a fruitful meeting and Bill will certainly be able to send Peter an offer before he came back to the US.

Before dinner, Peter investigated a point that bothered him: Bill used the term �Service� in an unusual way. He wrote: �Acme computing will run the trust rating service for semanticweb.org� (a sentence from Bill). His assistant found no problem, so he hit: �service�, the assistant displayed �service in {database} equivalent To: infrastructure�. Peter asked for �metainfo�, which raised �Updated today by negotiating with Bill�s assistant�.

Peter again asked for �Arguments!�: �Service in {database} conflicts with service in {web}�. �Explain!� �In operating system and database, the term services covers features like fault-tolerance, cache, security, that we are used to putting in the infrastructure. More evidence?� Peter was glad he had not to search the whole Web for an explanation of this. The two assistants detected the issue and negotiated silently a solution to this problem. He had some time left before getting to the th�atre de la ville. His assistant made the miracle to book him a place for a rare show of Anne-Theresa De Keermaeker�s troupe in Paris. It had to resort to a particular web service that it found through a dance-related common interest pool of assistants.


3 Presumably the International Semantic Web Conference.

In these scenarios, you can see quite a few Semantic Web areas in operation at the same time. Software agents (the digital assistants) are discovering meta data and information and processing it. Logical reasoning is not only used to make inferences, it�s also explained to the human user. Assessments of trust and reliability are deduced through networks of interacting information. We see the discovery of web services. It all seems so plausible and so useful.

1.2.1 Can the Semantic Web work this way?

What needs to be developed, what needs to be in place, for the Semantic Web to work as envisioned in the previous scenarios? The keys are the widespread interchange of data and ways to mark, indicate, or describe what that data is, how it�s structured, how it can be retrieved, and what it means. Each of these areas is a large undertaking in itself. But the Semantic Web will be a sociological development, too. Companies must cooperate where they might normally compete; academic research must be translated into practical systems; individuals must discover how they can contribute; and issues of for-profit versus free, of closed versus open systems, and of trust need to be worked out.

The task is much bigger than the building of the original World Wide Web. At that time, few people realized how many new capabilities the Web would unleash. Today, some of the basic infrastructure is already in place. There are organizations like the World Wide Web Consortium (W3C), whose purpose includes developing and advancing standards of importance to the Internet as a whole, including the Semantic Web. So the task is bigger, but the starting point is more advanced.

Can the visions be realized? Opinions vary�mine is that many of them will come to pass (some are already beginning to operate) and make a real difference in the lives of people who use the Web.

1.3 The Semantic Web�s foundation

The World Wide Web has certain design features that make it different from earlier hyperlink experiments. These features will play an important role in the design of the Semantic Web. The Web is not the whole Internet, and it would be possible to develop many capabilities of the Semantic Web using other means besides the World Wide Web. But because the Web is so widespread, and because its basic operations are relatively simple, most of the technologies being contemplated for the Semantic Web are based on the current Web, sometimes with extensions. However, web services (chapter 8) and agents (chapter 9) may step outside the architecture of the current Web, as you�ll see.4

The Web is designed around resources, standardized addressing of those resources (Uniform Resource Locators and Uniform Resource Indicators), and a small, widely understood set of commands. It�s also designed to operate over very large and complex networks in a decentralized way. Let�s look at each of these design features.


4 I�m referring in part to the so-called REST (Representation State Transfer) architecture and the controversy over whether current SOAP-based web services that don�t use this model would be better suited to the Web if they did.

1.3.1 Resources

The Web addresses, retrieves, links to, and modifies resources. A resource is intended to represent any idea that can be referred to. Usually, we think of these resources as being tangible packages of data (documents or pages), but the notion of a resource is more general in two ways. First, a resource can change over time and still be considered the same resource, addressed by the same Uniform Resource Identifier (URI). Thus, a series of drafts of a manuscript could be addressed by the same URI. Alternatively, a URI could denote one specific, unchanging version of the same document. The notion of resource is flexible enough to encompass both varying and fixed resources.

Strictly speaking, a resource itself is not retrieved, but only a representation of the resource. For some protocols, like File Transfer Protocol (FTP), the representation is normally a copy of a file. For others, like HTTP, the representation may or may not be a copy of a file. A resource can even be represented by different forms�a PDF file, an HTML page, a voice recording, and so on.

Second, and perhaps harder to grasp, a resource can be something that doesn�t yet exist, and that may never exist. A resource can be a concept or a reference to a real or fictitious person�something that can�t be addressed and transferred over a network, but that can be talked about, thought about. For the purposes of the Semantic Web, such a resource can be referred to or identified by a URI.5


5 For example, RFC 1737, �Functional Requirements for Uniform Resource Names� (a subset of URIs), says, �The purpose or function of a URN is to provide a globally unique, persistent identifier used for recognition, for access to characteristics of the resource or for access to the resource itself.� (Emphasis added.)

1.3.2 Standardized addressing

All resources on the Web are referred to by URIs. The most familiar URIs are those that address resources that can be addressed and retrieved; these are called URLs, for Uniform Resource Locators. These URIs have a uniform structure that can refer to the use of other protocols besides HTTP (like FTP), and they are easy to type and copy. They can be inserted into hyperlinks so that any addressable information can be easily linked.

1.3.3 Small set of commands

The HTTP protocol (the protocol used to send messages back and forth over the Web) uses a small set of commands. These commands are universally understood by web servers, clients (like browsers), and intermediate components like caches-which can reduce network traffic by storing copies of documents that were previously sent. With this limited set of commands, there is no question about what is being requested of the server and network, and no visibility into how the server may choose to carry out the requests. This model doesn�t provide security or personal privacy for the information being sent or requested; but, since it is simple and well understood, the model lends itself to the provision of additional layers of security.6

However, some architectures use complex messages or need to restrict the visibility of message contents, and they use an approach that�s more involved than basic HTTP. Other Internet protocols can be used, and additional messaging layers can be carried over HTTP as well (such as SOAP, whose name no longer stands for anything). There is some controversy over what methods should be used for the Web�as distinct from the Internet, which includes much more than the World Wide Web�and whether the Semantic Web architecture should restrict itself to the simpler architecture of the current Web.


6 There is some controversy over whether the web model supports security provisions better than other network architectures such as Remote Procedure Call (RPC) systems.

1.3.4 Scalability and large networks

The Web has to operate over a very large network with an enormous number of web sites and to continue to work as the network�s size increases. It accomplishes this, thanks to two main design features. First, the Web is decentralized. If you have a computer on the network, you can put a web server on it; and if you have a server, you can add resources to it without registering them anywhere else.

Second, each transaction on the Web (that is, a request and the subsequent response) contains all the information needed to handle the request. No data needs to be stored by the server from one request to another. However, many practical uses of the Web do require that some data be saved for a period of time. If you reserve a ticket and then order it on another web page, the system must store your ticket reservation and be able to connect it to your request to purchase. Since any web transaction is separate from all others, it�s harder to arrange to maintain data across a connected series of transactions. Independent interactions make possible a large, decentralized system where responses can be cached to allow faster responses and reduce network traffic.

Data that maintains some history of transactions is sometimes called state, as in �the state of the system�. Web transactions are stateless.7 If there is a business need to store information across several interactions, the server must provide special arrangements to make it happen.


7 When a cookie is stored on your computer, the cookie stores some state information. Unfortunately, this state doesn�t fit the web model well, so it can sometimes cause confusion between browser, server, and user.

1.3.5 Openness, completeness, and consistency

The Web is open, meaning that web sites and web resources can be added freely and without central controls. The assignment of domain names to servers does need some central authority to avoid duplicate names,8 but this in no way restricts your ability to establish web servers and the information they provide.

The Web is incomplete, meaning there can be no guarantee that every link will work or that all possible information will be available. It can be inconsistent: anyone can say anything on a web page, so different web pages can easily contradict each other. Information on the Web will never be fully consistent, and it also changes constantly. Just think of all the web pages you�ve returned to that changed since you last visited them, or that don�t even exist anymore. Software that wishes to draw logical conclusions from data on the Web must work with reasonable reliability in the face of all this change, potential inconsistency, and incompleteness.


8 The domain name is the general part of the server�s name�usually, many servers share a domain name. For example, in the URL www.cnn.com, the domain name is cnn.com.

1.3.6 The Web and the Semantic Web

For the Semantic Web to follow the current web model, it should use key aspects of the current World Wide Web:

  • Use URI-style addressing
  • Have notions of addressable and non-addressable resources (a non-addressable resource is something that can be talked about�like a car or a concept� but can�t be retrieved over a communications network)
  • Use protocols with a small and universally understood set of commands (likely to include extensions to the current command set)
  • Maintain little or, preferably, no state information
  • Be as decentralized as possible
  • Function on a large scale
  • Allow local caching of information to speed access and reduce network loads
  • Be able to operate with missing links and with incomplete and inconsistent information.

It�s an open question whether services and agents will be designed to�or will be able to�follow these prescriptions.

1.4 The Semantic Web layer cake

The W3C has been a leader in developing technologies for the Web. The organization is headed by Tim Berners-Lee, who, not resting on his earlier accomplishments in relation to the Web, has also been promoting the development of the Semantic Web. Many of the apparently foundational technologies, such as XML and RDF, have been developed by the W3C. So the W3C approach to the evolution of the Semantic Web is worth looking at.

The W3C web pages on the Semantic Web include a diagram labeled Architecture. This diagram, sometimes called the �Semantic Web layer cake�, has been reproduced often, and our own version of it is depicted in figure 1.1. Descriptions of the layers are as follows:

Figure 1.1 The layered technologies of the Semantic Web, according to Tim Berners-Lee and the W3C. Each layer is seen as building on�and requiring�the ones below it. The W3C has developed, or is in the process of developing, standards and recommendations for all but the top two layers, and the W3C recommendations for digital signatures and managing encryption keys will also play roles in the Trust layer.

  • XML�Extensible Markup Language. The language framework that, since 1998, has been used to define nearly all new languages that are used to interchange data over the Web.
  • XML Schema � A language used to define the structure of specific XML languages.
  • RDF�Resource Description Framework. A flexible language capable of describing all sorts of information and meta data. RDF is covered in chapter 2. Topic maps, a non-W3C alternative standard, are discussed in chapter 3.
  • RDF Schema � A framework that provides a means to specify basic vocabularies for specific RDF application languages to use. RDF Schema is covered in chapter 7.
  • Ontology � Languages used to define vocabularies and establish the usage of words and terms in the context of a specific vocabulary. RDF Schema is a framework for constructing ontologies and is used by many more advanced ontology frameworks. OWL is an ontology language designed for the Semantic Web. Chapter 7 discusses ontologies, including OWL.
  • Logic and Proof � Logical reasoning is used to establish the consistency and correctness of data sets and to infer conclusions that aren�t explicitly stated but are required by or consistent with a known set of data. Proofs trace or explain the steps of logical reasoning. Chapter 6 covers some issues relating to logic in the Semantic Web.
  • Trust � A means of providing authentication of identity and evidence of the trustworthiness of data, services, and agents. Chapter 10 covers issues of trust with regards to the Semantic Web.

Each layer is seen as building on the layer below. At the base, most data is expected to be created in XML formats. Each layer is progressively more specialized and also tends to be more complex than the layers below it. A lower layer doesn�t depend on any higher layers. Thus, the layers can be developed and made operational relatively independently. XML is in place now, and XML Schema has recently become standardized. RDF has been released as a W3C Recommendation,9 (and has just been re-released with changes). The other layers are under development, and their form and direction are progressively uncertain according to their altitude in the layer cake.

You should realize that this diagram represents the W3C view, and most of the technologies depicted in the diagram are W3C developed or endorsed. There are potential alternatives for some of the layers. Among others, alternative schemas exist for XML documents, and there are quite a few alternative efforts to develop ontology systems.

If you noticed that there is no layer labeled Web Services, it�s true that services don�t fit neatly into this layer cake. Such technologies make use of several layers, such as XML and XML Schemas�and perhaps, in the future, RDF and Ontology. This book also discusses other technologies and subjects that don�t appear in the layer-cake diagram.


9 The W3C publishes technology standards like HTML, the common Hypertext Markup Language. It calls them Recommendations, even though many people informally call them standards or specifications. In the W3C process, a document proceeds through a series of draft stages, moving from Working Draft through Candidate Recommendation before it gets released as an approved Recommendation.

1.4.1 The base

The Web holds an enormous amount of information, and most of it is in HTML format�the language used to describe the content of ordinary web pages. This works because HTML is so widely understood (by browsers) and also because HTML is simple for page creators to understand. HTML does describe the information it contains, but it does so in terms of generic units that apply to most ordinary documents�paragraphs, headings, images, tables, and so on. An HTML page can�t label a chunk of the page to say, �This is employee information from database �X�;�, it can only say (in terms that a computer can use), �This is a table, and here are its rows and columns�.

The HTML for a page describes these generic document units and their order, and it�s the browser�s job to decide how to display them. You can give browsers suggestions, which they will normally try to honor as best as they can. This approach has been wildly successful when the information is intended for people to read.

With XML, you can describe the structure of the information in other ways, not just in terms of generic document units. You can choose the kind of structure that�s most suitable for the particular information and anticipated use. Thus, XML is seen as the foundation layer of the Semantic Web.

The XML Schema layer provides the ability to specify structures and data types to be used by a particular XML document. The XML and XML Schema layers aren�t covered in this book, because they�re general-purpose technologies that aren�t specially related to the Semantic Web.

1.4.2 Properties and relationships

To a browser, the �meaning� of an HTML page lies in the widely shared understanding of how to display the different kinds of generic units that appear on a web page. For a general-purpose structure, there likewise needs to be a means of indicating the meaning of the different structural units, and this should also be widely shared. This is the role of RDF. However, the idea of meaning is complicated and has many levels, and RDF deals with only two of them: assigning properties to things and relating one thing to another.

The RDF Schema layer describes those properties�what they are, which resources they can be assigned to, and so on. The Ontology layer takes this a step further: not only does it describe the properties and terms that can be used, but it also can describe relationships between them.

These layers are used to describe, or represent, knowledge. Although the W3C version of the layer cake shows only RDF and RDF Schema, this book also discusses another candidate for representing knowledge: topic maps.

1.4.3 Analysis, verification, and trust

Once the relationships between resources, terms, and properties have been established, the statements that RDF expresses can be analyzed for consistency and inferences can be made. By this means, facts that aren�t explicitly stated can be discovered, and inconsistent facts can (sometimes) be reconciled. The Logic and Proof layer provides these capabilities.

When I purchase a book and give my credit card number, the bookseller wants to know if the card is mine. If I�m there in person, I can show my driver�s license to give a kind of credence to my claim of identity. In essence, I�m saying, �If you don�t believe me, then trust the licensing authority�. This is acceptable as long as the seller believes that the identity card is not counterfeit. The seller might assess the validity of the card by its visual appearance, the match between the picture and my face, the degree of agreement between the age on the card and my apparent age, and any number of other clues. Here, you see several principles in play: an appeal to authority, the trustworthiness of that authority, an inference of validity based on a set of facts, and beliefs about those facts.

When I buy the same book online, the seller likewise needs to have some assurance that the credit card is valid and that I�m authorized to use it. I need assurance that the web page really belongs to the bookseller and not to a criminal who wants to get access to my card. When software programs work with data they get over the Web, they face the same problems. The Trust layer will attempt to handle these issues. You can see how it will use all the other layers. The Trust machinery has to call on the Logic and Proof layer to analyze claims, make inferences, and draw conclusions. The Logic and Proof layer needs to know how the terms and properties relate to each other and whether they�re used correctly, which is the business of the Ontology layer. The Ontology layer needs to use the data structures defined and created by the RDF and XML Schema layers. These dependencies can also be viewed in the other direction: the RDF layer uses RDF Schema and Ontology as it assigns its properties, and the XML layer provides transportable data structures for the RDF information.

This layer cake structure sounds complicated, and it is. But it doesn�t all have to be in place before anything can be done. The important thing is to get widespread acceptance of relevant bits. That�s how the Web spread in the first place.

1.5 Summary

The Semantic Web is not a cut-and-dried, integrated technology. It�s a concept of how computers, people, and the Web can work together more effectively than is possible now. Because it�s visionary, it has no one definition. In fact, you saw a staggering array of notions earlier in this chapter, such as the machine-readable- data view, the intelligent agents view, and many more. However, these overlapping views have some aspects in common, and this book deals with these commonalities.

Basically, all the views assume that computers will be able to read and use data that today is mainly accessible to people. All views see computers as being able to use this data to perform tasks that help people. Within this broad range, certain themes show up repeatedly, as we�ve discussed in this chapter. The rest of this book covers each of these themes as they relate to the Semantic Web.

The current Web is the one success story we have of a very large, distributed, loosely connected, inconsistent system. It seems reasonable, therefore, that the Semantic Web should use the strengths of the current Web, especially key design patterns that have made it a success. In fact, it should be an extension of the current Web.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here