Candice Fong Connecting the Dots: A Semantic Web Primer
Jinfo Blog

3rd October 2011

By Candice Fong

Abstract

The much talked-about semantic web appears to be getting ever closer and organisations like Google, Best Buy, TripIt and ZoomInfo are already using semantic technologies to connect with users or increase their online presence. Candice Fong provides a very useful introduction to the technologies and tools, and questions whether law may benefit from the semantic web.

Item

Connecting the dots: A semantic web primer

By Candice Fong

Definition

The semantic web is another way to “represent web content in a form more easily machine-processable and to use intelligent techniques to take advantage of these representations” (Antoniou & van Harmelen, p3). In other words, the semantic web is where machines can automatically process the meaning of content along with the ability to make meaningful connections and links among information. It is not about machines being able to understand but being able to process information effectively (Antoniou & van Harmelen, p3; Alesso & Smith, p37).

How does it work?

Now that we know what the semantic web is, how is it made possible? In order to aid in machine processing, there are a number of tools used, among which are metadata, ontologies and logic (Alesso & Smith, p39).

XML: Like HTML (hypertext markup language), XML (extensible markup language) is a markup language that allows users to use tags. However, the major difference between the two is that XML contains structured data, i.e. “information about pieces of the document and their relationships”. (Antoniou & Van Harmelen, p24). This structure occurs in two ways: 1) using tags to describe each piece of information; and 2) nesting so that tags within tags also convey information. For example, if a tag is within a tag, then the machine processing the information knows the exact relationship between the two. XML is powerful as it allows users to define tags of their own to suit their needs but this, in turn, emphasises the importance of web ontologies and vocabulary to ensure machine accessibility.

RDF : RDF (Resource Description Framework) is another model that lets users “describe resources using their own vocabularies” (Antoniou & Van Harmelen, p80). However, this vocabulary must be defined using RDF Schema. In essence, this is what brings in the semantic element and a way to describe the relationships among resources (Alesso & Smith, p92). However, RDF and RDF Schema have their shortcomings as they are not user friendly and the ability to convey complex meaning is still weak.

OWL: To overcome the inherent weaknesses of RDF and RDF Schema, Web Ontology Language (OWL) was created to “standardise the definition of real world concepts” (Alesso & Smith, p113) and allow for shared meaning even if different terminology is used. Basically, a web ontology consists of a taxonomy plus a set of inference rules so that new knowledge can be produced and automated reasoning performed (Alesso & Smith, p112). However, there is a tradeoff as, the richer the language, the more inefficient the reasoning support becomes until the process can no longer be automated (Antoniou & Van Harmelen, p112).

Examples

There are a number of organisations that are taking advantage of semantic web technologies and tools to help them connect with users or increase their online presence.

Google (www.google.com)

Google has been experimenting with the semantic web with its rich snippets initiative. For example, a friend tells you about a restaurant that opened up last year. When you search for the restaurant in Google, you will not only get the restaurant’s website as part of the search results, but you’ll also get to see its rating on Urbanspoon, find out how many people reviewed it, and the price range.

Best Buy (www.bestbuy.com)

Best Buy is taking advantage of RDFa, a semantic web markup language that adds metadata to HTML or XHTML webpages. Best Buy employees can input and update data particular to their store and this metadata ensures that their store gets picked up in searches and social applications. The driving force behind this is GoodRelations (http://web.freepint.com/5betds), a standardised vocabulary for eCommerce sites that is embedded in a webpage.

TripIt (www.tripit.com)

TripIt is a travel aggregator that aggregates airline travel, car rentals, hotel information and other travel data. It creates a single itinerary in one place to make it easier to keep track of travel plans – all the person needs to do is forward the confirmation emails and TripIt will automatically build the itinerary. In addition, a person can book activities, check-in for flights, and consult maps and weather reports without having to visit another site.

ZoomInfo (www.zoominfo.com )

If you have ever been curious about what your online profile might be, this is the site. Again, this site can build a profile of a person by pulling information from many sources. However, results are dependent on the metadata and so sometimes a person’s profile may be false. For example, searching for my name produces a profile that certainly refers to my time in the MLIS program but this has been interpreted as me being an employee of The M’LIS Company. This exposes a shortcoming of the semantic web; the metadata used to aid in machine accessibility is only as good as the metadata itself or the presence of a web ontology.

The semantic web and the law

The semantic web is definitely making inroads even though it may not be widely understood, recognised or apparent. However, what does this mean for the legal community, and in particular, legal research? Brian Harley wrote a series of three articles in The Columbia Science and Technology Law Review (http://web.freepint.com/5betdt) addressing this issue. In essence, there is so much legal data out there that it is harder and harder to handle it all and the risk of information overload is great. Law may be a good candidate to take advantage of semantic technology since there are already legal taxonomies and formalised rules, and translating them into structured data would not be a far stretch. Imagine someone typing in a question and then the answer appears without having to read cases and legislation, understand search strategies, nor consult multiple sources and databases. However, the greater challenge is that the law is not written in stone and it can change and evolve, plus, we must acknowledge that many legal issues are not black and white but are nuanced – how does one convey that to a machine? In the end, there is still a need for the human mind to make sense of the law and it will be interesting to see if the semantic web or some other technology can ever replace that.

(This article first appeared in the TALL Quarterly, Spring 2011, volume 30, issue 1)

Further reading

1)     Semantic Web for Dummies by Jeffrey T. Pollock, Wiley Publishing Inc., 2009, Indianapolis, Indiana

2)     Thinking on the Web: Berners-Lee, Godel and Turing by H. Peter Alesso & Craig F. Smith, John Wiley & Sons, Inc., 2009, Hoboken, New Jersey

3)     A Semantic Web Primer by Grigoris Antoniou & Frank van Harmelen, The MIT Press, 2004, Cambridge, Massachusetts

« Blog