web3_1

One development that is suggested by Web 3.0 is the semantic web. This is the idea of a web where all information is organised and stored in such a way that the computer can be taught to understand it;

The web we have to today is a huge collection of documents and the words of all those documents are indexed. >> We can search for keywords but this throws up any related document with that keyword in it:

Web 2.0

The data must be described in a more structural way so that computer can interpret more meaning from the information

Web 3.0 creates a big collection of databases which can be linked on demand. This is a more efficient way of organising information on the web.

When a computer understands what data means it can do more intelligent searches, reasoning & combining.

The Semantic Web is a web of data. There is lots of data we all use every day, and it is not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?
Why not? Because we don't have a web of data. Because data is controlled by applications, and each application keeps it to itself.
The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.
See also the activity news for an account of recent events, publications, etc. For links to tools, books, further details on the technologies, you can also refer to the Semantic Web Standards Wiki (and you are welcome to modify those pages when necessary and appropriate). You may also want to look at the collection of SW Case Studies and Use Cases to see how organizations are using these technologies today. Finally, for an exhaustive list of all the specifications published by the activity, please refer to the separate list of publications.

The Semantic Web is the extension of the World Wide Web that enables people to share content beyond the boundaries of applications and websites. It has been described in rather different ways: as a utopic vision, as a web of data, or merely as a natural paradigm shift in our daily use of the Web. Most of all, the Semantic Web has inspired and engaged many people to create innovative semantic technologies and applications. semanticweb.org is the common platform for this community.

What is the Semantic Web?

The Semantic Web is a web that is able to describe things in a way that computers can understand.
The Beatles was a popular band from Liverpool.
John Lennon was a member of the Beatles.
"Hey Jude" was recorded by the Beatles.

Sentences like the ones above can be understood by people. But how can they be understood by computers? Statements are built with syntax rules. The syntax of a language defines the rules for building the language statements. But how can syntax become semantic? This is what the Semantic Web is all about. Describing things in a way that computers applications can understand it. The Semantic Web is not about links between web pages. The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)
"If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database" Tim Berners-Lee, Weaving the Web, 1999

The Resource Description Framework
The RDF (Resource Description Framework) is a language for describing information and resources on the web. Putting information into RDF files, makes it possible for computer programs ("web spiders") to search, discover, pick up, collect, analyze and process information from the web. The Semantic Web uses RDF to describe web resources. If you want to learn more about RDF, please read our RDF tutorial. How can it be used? If information about music, cars, tickets, etc. were stored in RDF files, intelligent web applications could collect information from many different sources, combine information, and present it to users in a meaningful way.
Information like this:
Car prices from different resellers
Information about medicines
Plane schedules
Spare parts for the industry
Information about books (price, pages, editor, year)
Dates of events
Computer updates
Can it be understood?

The Semantic Web is not a very fast growing technology. One of the reasons for that is the learning curve. RDF was developed by people with academic background in logic and artificial intelligence. For traditional developers it is not very easy to understand. One fast growing language for building semantic web applications is RSS. If you want to learn more about RSS, please read our RSS tutorial.
In the following pages of this tutorial we will concentrate on using RDF to discover the potentials of the semantic web.

What Is The Semantic Web?
The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database.
The Semantic Web was thought up by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML. There is a dedicated team of people at the World Wide Web consortium (W3C) working to improve, extend and standardize the system, and many languages, publications, tools and so on have already been developed. However, Semantic Web technologies are still very much in their infancies, and although the future of the project in general appears to be bright, there seems to be little consensus about the likely direction and characteristics of the early Semantic Web.

What's the rationale for such a system? Data that is geneally hidden away in HTML files is often useful in some contexts, but not in others. The problem with the majority of data on the Web that is in this form at the moment is that it is difficult to use on a large scale, because there is no global system for publishing data in such a way as it can be easily processed by anyone. For example, just think of information about local sports events, weather information, plane times, Major League Baseball statistics, and television guides... all of this information is presented by numerous sites, but all in HTML. The problem with that is that, is some contexts, it is difficult to use this data in the ways that one might want to do so.
So the Semantic Web can be seen as a huge engineering solution... but it is more than that. We will find that as it becomes easier to publish data in a repurposable form, so more people will want to pubish data, and there will be a knock-on or domino effect. We may find that a large number of Semantic Web applications can be used for a variety of different tasks, increasing the modularity of applications on the Web. But enough subjective reasoning... onto how this will be accomplished.
The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called "Resource Description Framework" syntaxes.

URI - Uniform Resource Identifier

A URI is simply a Web identifier: like the strings starting with "http:" or "ftp:" that you often find on the World Wide Web. Anyone can create a URI, and the ownership of them is clearly delegated, so they form an ideal base technology with which to build a global Web on top of. In fact, the World Wide Web is such a thing: anything that has a URI is considered to be "on the Web".
The syntax of URIs is carefully governed by the IETF, who published RFC 2396 as the general URI specification. The W3C maintains a list of URI schemes.

RDF - Resource Description Framework

A triple can simply be described as three URIs. A language which utilises three URIs in such a way is called RDF: the W3C have developed an XML serialization of RDF, the "Syntax" in the RDF Model and Syntax recommendation. RDF XML is considered to be the standard interchange format for RDF on the Semantic Web, although it is not the only format. For example, Notation3 (which we shall be going through later on in this article) is an excellent plain text alternative serialization.

Once information is in RDF form, it becomes easy to process it, since RDF is a generic format, which already has many parsers. XML RDF is quite a verbose specification, and it can take some getting used to (for example, to learn XML RDF properly, you need to understand a little about XML and namespaces beforehand...), but let's take a quick look at an example of XML RDF right now:-

Why RDF?

When people are confronted with XML RDF for the first time, they usually have two questions: "why use RDF rather than XML?", and "do we use XML Schema in conjunction with RDF?".
The answer to "why use RDF rather than XML?" is quite simple, and is twofold. Firstly, the benefit that one gets from drafting a language in RDF is that the information maps directly and unambiguously to a model, a model which is decentralized, and for which there are many generic parsers already available. This means that when you have an RDF application, you know which bits of data are the semantics of the application, and which bits are just syntactic fluff. And not only do you know that, everyone knows that, often implicitly without even reading a specification because RDF is so well known. The second part of the twofold answer is that we hope that RDF data will become a part of the Semantic Web, so the benefits of drafting your data in RDF now draws parallels with drafting your information in HTML in the early days of the Web.
The answer to "do we use XML Schema in conjunction with RDF?" is almost as brief. XML Schema is a language for restricting the syntax of XML applications. RDF already has a built in BNF that sets out how the language is to be used, so on the face of it the answer is a solid "no". However, using XML Schema in conjunction with RDF may be useful for creating datatypes and so on. Therefore the answer is "possibly", with a caveat that it is not really used to control the syntax of RDF. This is a common misunderstanding, perpetuated for too long now.

Screen Scraping, and Forms

For the Semantic Web to reach its full potential, many people need to start publishing data as RDF. Where is this information going to come from? A lot of it can be derived from many data publications that exist today, using a process called "screen scraping". Screen scraping is the act of literally getting the data from a source into a more manageable form (i.e. RDF) using whatever means come to hand. Two useful tools for screen scraping are XSLT (an XML transformations language), and RegExps (in Perl, Python, and so on).
However, screen scraping is often a tedious solution, so another way to approach it is to build proper RDF systems that take input from the user and then store it straight away in RDF. Data such as you may enter when signing up for a new mail account, buying some CDs online, or searching for a used car can all be stored as RDF and then used on the Semantic Web.

Currently the focus of a W3C working group, the Semantic Web vision was conceived by Tim Berners-Lee, the inventor of the World Wide Web. The World Wide Web changed the way we communicate, the way we do business, the way we seek information and entertainment – the very way most of us live our daily lives. Calling it the next step in Web evolution, Berners-Lee defines the Semantic Web as “a web of data that can be processed directly and indirectly by machines.”

In the Semantic Web data itself becomes part of the Web and is able to be processed independently of application, platform, or domain. This is in contrast to the World Wide Web as we know it today, which contains virtually boundless information in the form of documents. We can use computers to search for these documents, but they still have to be read and interpreted by humans before any useful information can be extrapolated. Computers can present you with information but can’t understand what the information is well enough to display the data that is most relevant in a given circumstance. The Semantic Web, on the other hand, is about having data as well as documents on the Web so that machines can process, transform, assemble, and even act on the data in useful ways.

Imagine this scenario. You’re a software consultant and have just received a new project. You’re to create a series of SOAP-based Web services for one of your biggest clients. First, you need to learn a bit about SOAP, so you search for the term using your favorite search engine. Unfortunately, the results you’re presented with are hardly helpful. There are listings for dish detergents, facial soaps, and even soap operas mixed into the results. Only after sifting through multiple listings and reading through the linked pages are you able to find information about the W3C’s SOAP specifications.

Because of the different semantic associations of the word “soap,” the results you receive are varied in relevance, and you still have to do a lot of work to find the information you’re looking for. However, in a Semantic Web-enabled environment, you could use a Semantic Web agent to search the Web for “SOAP” where SOAP is a type of technology specification used in Web services. This time, the results of your search will be relevant. Your Semantic Web agent can also search your corporate network for the SOAP specification and discover if your colleagues have completed similar projects or have posted SOAP-related research on the network. Based on the semantic information available for SOAP, your agent also presents you with a list of related technologies. Now you know that WSDL, XML, and URI are all technologies related to SOAP, and that you’ll need to do some research on them, too, before beginning your project. Armed with the information returned by your Semantic Web agent, you read the related technology specifications and send emails to the colleagues who have made SOAP-related materials available on the network to ask for their input before starting your new project.

Now, fast forward a few years. You’re still happily employed as a software consultant, and today you’re taking a working lunch with one of your biggest clients. Her company has an emergency project at its San Francisco branch for which they need you to consult for two weeks, and she asks you to get to San Francisco as soon as possible to begin work. You take out your hand held computer, activate its Semantic Web agent, and instruct it to book a non-stop flight to San Francisco that leaves before 10 AM the next day. You want an aisle seat if it’s available. Once your agent finds an acceptable flight with an available aisle seat, it books it using your American Express card and assigns the charges to your client’s account in your accounting application. It also warns you that you’ll be missing a dentist appointment back home during your trip and adds a note to your calendar reminding you to reschedule. Next, you specify that you want a car service to the client’s site, so your agent scans the availability of limos with “very good” or higher service ratings and books an appointment to have you picked up 30 minutes after your flight lands. Your agent also books you at your favorite hotel in San Francisco, automatically securing the lowest rate using your rewards card number. Finally, the agent updates your calendar and your manager’s calendar with your trip information and prints out your confirmation documents back at your office.

With just a few clicks your Semantic Web agent found and booked your flight, hotel, and car service, then updated your accounting system and calendars automatically. It even compared your itinerary to your calendar and detected the scheduling conflict with your dentist appointment. To do all this, the agent had to find, interpret, combine, and act on information from multiple sources. This example, of course, is a long-term vision for applying the Semantic Web. It’s one that may or may not come to fruition, and only the future will tell. However, the vision itself is important for understanding the potential of Semantic Web technologies.

Considering the two examples above, the list of scenarios that could potentially benefit from Semantic Web technologies as they continue to evolve is limited only by the imagination. Think of the possibilities opened to everything from crime investigation, scientific research, and literary analysis – to shopping, finding long-lost friends, and vacation planning – when computers can find, present, and act on data in a meaningful way.

The Semantic Web agent does not include artificial intelligence – rather, it relies on structured sets of information and inference rules that allow it to “understand” the relationship between different data resources. The computer doesn’t really understand information the way a human can, but it has enough information to make logical connections and decisions.

Broadening Our Horizons
The vision of the Semantic Web is a “web of data” that not only harnesses the seemingly endless amount of data on the World Wide Web, but also connects that information with data in relational databases and other non-interoperable information repositories, for example, EDI systems. Considering that relational databases house the majority of enterprise data today, the ability of Semantic Web technologies to access and process it alongside other data from Web sites, other databases, XML documents, and other systems increases the amount of useful data available exponentially. In addition, relational databases already include a great deal of semantic information. Databases are organized in tables and columns based on the relationships between the data they house, and these relationships reveal the meaning (the semantics) of the data.

Data integration applications offer the potential for connecting disparate sources, but they require one-to-one mappings between elements in each different data repository. The Semantic Web, however, allows a machine to connect to any other machine and exchange and process data efficiently based on built-in, universally available semantic information that describes each resource. In effect, the Semantic Web will allow us to access all the information listed above as one huge database.

Defining Semantics and Relationships
Implementing the Semantic Web requires adding semantic metadata, or data that describes data, to information resources. This will allow machines to effectively process the data based on the semantic information that describes it. When there is enough semantic information associated with data, computers can make inferences about the data, i.e., understand what a data resource is and how it relates to other data.

ose properties. RDF statements are often referred to as “triples” that consist of a subject, predicate, and object, which correspond to a resource (subject) a property (predicate), and a property value (object). Below is an example of an RDF statement in plain English:

What is the Semantic Web?

The Semantic Web is a web that is able to describe things in a way that computers can understand.
The Beatles was a popular band from Liverpool.
John Lennon was a member of the Beatles.
"Hey Jude" was recorded by the Beatles.

Sentences like the ones above can be understood by people. But how can they be understood by computers? Statements are built with syntax rules. The syntax of a language defines the rules for building the language statements. But how can syntax become semantic? This is what the Semantic Web is all about. Describing things in a way that computers applications can understand it. The Semantic Web is not about links between web pages. The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)
"If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database" Tim Berners-Lee, Weaving the Web, 1999

The Resource Description Framework
The RDF (Resource Description Framework) is a language for describing information and resources on the web. Putting information into RDF files, makes it possible for computer programs ("web spiders") to search, discover, pick up, collect, analyze and process information from the web. The Semantic Web uses RDF to describe web resources. If you want to learn more about RDF, please read our RDF tutorial. How can it be used? If information about music, cars, tickets, etc. were stored in RDF files, intelligent web applications could collect information from many different sources, combine information, and present it to users in a meaningful way.
Information like this:
Car prices from different resellers
Information about medicines
Plane schedules
Spare parts for the industry
Information about books (price, pages, editor, year)
Dates of events
Computer updates
Can it be understood?

The Semantic Web is not a very fast growing technology. One of the reasons for that is the learning curve. RDF was developed by people with academic background in logic and artificial intelligence. For traditional developers it is not very easy to understand. One fast growing language for building semantic web applications is RSS. If you want to learn more about RSS, please read our RSS tutorial.
In the following pages of this tutorial we will concentrate on using RDF to discover the potentials of the semantic web.

THE SEMANTIC WEB