Resources published on data.gouv.fr… but unavailable

Reading time: 2 minutes

Version française

I have used data.gouv.fr RDF metadata and an home made script to check the availability* of all the resources (files) published or simply referenced on http://data.gouv.fr. Indeed, the French open data portal offers the possibility to reference data published in one’s own premises. This script also gets the server response time for each resource.

This availability data is updated… once in a while. You can download the raw data here.

  • As of August 6th, 2015 (+ 29 days), 5,106 resources (+ 518) out of 53,635 (+ 1177) are unavailable, that is 9.5 % (+ 0.8 %) (this rise is not significant as I made a few changes in the checking process, see the change log)
  • As of July 8th, 2015, 4,588 resources out of 52,458 are unavailable, that is 8.7 %

Go to the data visualisation interface.

The causes of unavailability

There are probably dozens of reasons for a resource to be unavailable, but here are two I have clearly identified:

  • Certain URLs are malformed (the list, 41 resources as of August 7th, 2015)
  • Certain cities and regions had their open data portal indexed (or “harvested”) by the data.gouv.fr team a while ago. Unfortunately, this indexation was not maintained over time due to the manual nature of the process (the portals that use common platforms such as CKAN are automatically indexed). Thus links to the resources got obsolete. This issue is tricky to measure, as I don’t think the list of the affected organizations is public. Only way out: the affected organizations must either migrate to a platform data.gouv.fr supports natively (such as CKAN), or develop their own “harvester”.

Server response time

As of August 6th, you’ll be thrilled to know that the overall average response time was 333 milliseconds. Not bad !

The average response time for gouv.fr (French .gov) domains was 409 milliseconds, but only 77 milliseconds for *.data.gouv.fr subdomains !

Now, the bad news: 649 resources (1.2 %) have a response time of 3 seconds and 170 (0.32 %) over 10 seconds…

In the next episode, you’ll see more details on which organisations have the fastest servers, and which have the slowest. I can already tell that those who chose to host (and not only reference) their data on data.gouv.fr won’t be far from the top performers 😉

*The following error are considered proof of unavailability

  • 401 AUTHORIZATION REQUIRED
  • 403 FORBIDDEN
  • 400 BAD REQUEST
  • 404 NOT FOUND
  • 405 METHOD NOT ALLOWED
  • 410 GONE
  • 500 INTERNAL SERVER ERROR
  • 502 BAD GATEWAY
  • 502 PROXY ERROR
  • 503 SERVICE TEMPORARILY UNAVAILABLE
  • 504 GATEWAY TIMEOUT