RDF for the beginner / Various data models compared with RDF (written for beginners!) |
First things first, the Resource Description Framework (RDF), is a W3C recommendation since 2003, with an update to version 1.1 in 2014. Its purpose is to achieve what the Web has enabled for documents, but this time with data. Since the dawn of the Web, one of its main strength is to enable users to interlink documents easily, building browsing paths for the Web users. One Web page can be the starting point to an infinity of other Web pages. With RDF, it's the same, but with data.
A graph is a structure in which each node can be related to an infinity of other nodes. Without further ado, let's see what our family data looks like as a graph (not RDF yet):
The graph of the Landais-Todorov family
This graph doesn't represent all the data, but it shows the key features of a graph model: each node can be linked directly to an infinity of other nodes via unidirectional relations. For instance, if we were interested in uncle/nephew relationships and would like to have a more direct connection, we could perfectly link Boris to Isaac and Salomé with an hasUncle property, without affecting the rest of the graph. The data model is not limiting.
Another way to represent a graph is write the triples that compose it. A triple is a statement compose of a subject, a property (or predicate) and an object.
Subject | Predicate | Object |
Olga | isInLoveWith | Eric |
Boris | hasSister | Olga |
Salomé | hasBrother | Isaac |
... | ... | ... |
RDF is a standard that is used to represent data as a graph, but it also includes a vocabulary to semantically describe the things depicted by the graph. First, let's add a pinch of RDF and semantics on our family graph:
The RDF graph of the Landais-Todorov family
Subject | Predicate | Object |
http://elandais.fr/eric | http://xmlns.com/foaf/0.1/givenName | "Eric" |
http://elandais.fr/eric | http://example.com/isInLoveWith | http://elandais.fr/olga |
http://elandais.fr/salomé | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://example.com/Female |
... | ... | ... |
Same triples but in the standard Turtle syntax, the most common and convenient way to write RDF by hand:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/> . @prefix ex: <http://example.com/family/> . <http://elandais.fr/eric> foaf:givenName "Eric" ; ex:isInLoveWith <http://elandais.fr/olga> . <http://elandais.fr/salomé> rdf:type ex:Female . ...
I have added a lot of new stuff, so I think the best way to explain with clarity is to explain the changes one by one.
If you are familiar with XML namespaces, this is roughly the same in RDF. The prefixes are a shortcut, as everything in RDF is identified with a URL. This way when you type information you don't need to always write the full URI, you only need to declare the prefixes at the beginning of your file. For instance, foaf:age is actually http://xmlns.com/foaf/0.1/age, which is a link to the specifications of the Friend Of A Friend vocabulary. This system is very convenient as it enables anyone to create an age property as long as they have control over an Internet domain.
For instance, I could create the property http://colin.maudry.com/properties#age. I haven't created a property for the age of a person, but I have created the DITA ontology, that resides in http://colin.maudry.com/ontologies/dita#, and for which the recommended prefix is dita:.
I also could have used ex:age, but FOAF is a very popular vocabulary on the Semantic Web to describe people, and is consequently very well understood by people and software that deal with RDF.
URIs (URL used as identifiers) are used to identify everything in RDF. They are very convenient because they rely on the Web technologies that ensure worldwide validity. If you own the domain yourname.net, you have the control over all the URIs that are based in this domain.
In our data, we see that the Landais family uses identifiers from the domain elandais.fr, because Eric Landais owns the domain. It is also possible to rely on an organization to create identifiers. For instance, I could use the URL of my About.me page to identify myself (http://about.me/ColinMaudry), since I have some control on what happens when you visit it.