The RDF graph

First things first, the Resource Description Framework (RDF), is a W3C recommendation since 2003, with an update to version 1.1 in 2014. Its purpose is to achieve what the Web has enabled for documents, but this time with data. Since the dawn of the Web, one of its main strength is to enable users to interlink documents easily, building browsing paths for the Web users. One Web page can be the starting point to an infinity of other Web pages. With RDF, it's the same, but with data.

What is a graph?

A graph is a structure in which each node can be related to an infinity of other nodes. Without further ado, let's see what our family data looks like as a graph (not RDF yet):

The graph of the Landais-Todorov family

This graph doesn't represent all the data, but it shows the key features of a graph model: each node can be linked directly to an infinity of other nodes via unidirectional relations. For instance, if we were interested in uncle/nephew relationships and would like to have a more direct connection, we could perfectly link Boris to Isaac and Salomé with an hasUncle property, without affecting the rest of the graph. The data model is not limiting.

Another way to represent a graph is write the triples that compose it. A triple is a statement compose of a subject, a property (or predicate) and an object.

Subject	Predicate	Object
Olga	isInLoveWith	Eric
Boris	hasSister	Olga
Salomé	hasBrother	Isaac
...	...	...

From a graph to an RDF graph

RDF is a standard that is used to represent data as a graph, but it also includes a vocabulary to semantically describe the things depicted by the graph. First, let's add a pinch of RDF and semantics on our family graph:

The RDF graph of the Landais-Todorov family

Subject	Predicate	Object
http://elandais.fr/eric	http://xmlns.com/foaf/0.1/givenName	"Eric"
http://elandais.fr/eric	http://example.com/isInLoveWith	http://elandais.fr/olga
http://elandais.fr/salomé	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://example.com/Female
...	...	...

Same triples but in the standard Turtle syntax, the most common and convenient way to write RDF by hand:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/> .
@prefix ex: <http://example.com/family/> .

<http://elandais.fr/eric> foaf:givenName "Eric" ;
	ex:isInLoveWith <http://elandais.fr/olga> .
<http://elandais.fr/salomé> rdf:type ex:Female .
...

I have added a lot of new stuff, so I think the best way to explain with clarity is to explain the changes one by one.

What are the xx: prefixes?

If you are familiar with XML namespaces, this is roughly the same in RDF. The prefixes are a shortcut, as everything in RDF is identified with a URL. This way when you type information you don't need to always write the full URI, you only need to declare the prefixes at the beginning of your file. For instance, foaf:age is actually http://xmlns.com/foaf/0.1/age, which is a link to the specifications of the Friend Of A Friend vocabulary. This system is very convenient as it enables anyone to create an age property as long as they have control over an Internet domain.

For instance, I could create the property http://colin.maudry.com/properties#age. I haven't created a property for the age of a person, but I have created the DITA ontology, that resides in http://colin.maudry.com/ontologies/dita#, and for which the recommended prefix is dita:.

I also could have used ex:age, but FOAF is a very popular vocabulary on the Semantic Web to describe people, and is consequently very well understood by people and software that deal with RDF.

I see URLs instead of people's names

URIs (URL used as identifiers) are used to identify everything in RDF. They are very convenient because they rely on the Web technologies that ensure worldwide validity. If you own the domain yourname.net, you have the control over all the URIs that are based in this domain.

In our data, we see that the Landais family uses identifiers from the domain elandais.fr, because Eric Landais owns the domain. It is also possible to rely on an organization to create identifiers. For instance, I could use the URL of my About.me page to identify myself (http://about.me/ColinMaudry), since I have some control on what happens when you visit it.

The rectangles are strings, right?

They can be strings, but they can also be integers, dates, or a various range of types. They are groupped under the generic term "literal" and are the only nodes of an RDF graph that don't have a URI. Their specificity is that can only be the subject of a property, not the subject. They are consequently the extremities of a graph.

The graph looks quite messy...

That's what happens when you can link each nodes to any number of other nodes. This way of drawing graphs is only used for small graphs that are useful for future reference. For example, the DITA ontology has a graph representation drawn the same way. For small quantities of data you can view it as Turtle. Otherwise, you need to query it.