Freie Universität Berlin DERI

The Linking Open Data cloud diagram

LATC Project

This web page is the home of the LOD cloud diagram. This image shows datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organisations. It is based on metadata collected and curated by contributors to the CKAN directory. Clicking the image will take you to an image map, where each dataset is a hyperlink to its homepage.

The diagram is maintained by Richard Cyganiak (DERI, NUI Galway) and Anja Jentzsch (Freie Universität Berlin). For any questions and comments, please email richard@cyganiak.de and mail@anjajentzsch.de.

Last updated: 2011-09-19

Linking Open Data cloud diagram, large version

Can I use this diagram in my slides, paper, book? #

Creative Commons License

Yes. This work is available under a CC-BY-SA license. This means you can include it in any other work under the condition that you give proper attribution. If you create derivative works (such as modified or extended versions of the diagram), then you must also license them as CC-BY-SA.

Linking Open Data cloud diagram, large version, colored by theme

Please give attribution along the following lines:

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

The diagram is available in PNG, in PDF and SVG versions.

There is also a colored-by-theme version in PNG, in PDF and SVG.

How can I get my dataset into the diagram? #

First, make sure that you publish data according to the Linked Data principles. We interpret this as:

CKAN

If your dataset meets these criteria:

  1. Please add it to CKAN, the open registry of data and content packages. See the Guidelines for Collecting Metadata onLinked Datasets in CKAN for more details. (Before creating a new CKAN package, please double-check whether a package already exists for your dataset.)
  2. We provide a handy CKAN record validator; use it to check that at least the minimum required information is present.
  3. Email richard@cyganiak.de and mail@anjajentzsch.de.
  4. We will review the CKAN record, add it to the CKAN lodcloud group.
  5. The dataset will be included in the next update of the diagram.

Why is my dataset not included? #

See the question above—please make sure that it meets the criteria, is in CKAN, and that we know about it. Other possible reasons why we exclude some datasets are:

Datasets of these kinds are important and valuable. They are, however, outside of the scope that we (somewhat arbitrarily) choose to display in this particular diagram.

Are all these datasets really open? #

Probably not. Unfortunately, most publishers do not publish their data with an explicit license. This leaves re-users in the dark about the specific rights that are granted or reserved by the publisher.

Given this state of affairs, we take a liberal view of what we consider “open”. If the data is openly accessible from a network point of view – that is, it's not behind an authorization check or paywall – then we will probably add it to the Cloud. Note that we keep track of explicit licenses on the Data Hub whenever we know about them. We aspire to provide a version color-coded by license in the future.

Before using any data, you should always check the publisher's website for the terms and conditions. If you don't find anyting, then the safest course of action is to assume that the publisher reserves all rights…

(Note that the Data Hub takes a stricter view on openness and considers a dataset “open” only if it has an explicit license that meets the Open Definition.)

Why don't you also show XYZ in the diagram? #

This diagram shows a particular perspective on the Web of Data. There are many other possible, perfectly valid, and valuable perspectives as well, that focus on other data formats, on other publishing methods, and on highlighting other aspects besides size, topic and interlinks. We chose to show this particular view, and encourage everyone to explore and visualise other views as well. See the Related Resources section for similar visualisations.

Can I get the raw data? #

Yes. The diagram is based on metadata from the lodcloud group on CKAN. This data is fully accessible through the CKAN API.

There are some code modules (Python, PHP, Drupal, Perl etc.) that provide convenient wrappers around much of the CKAN API. For full details of these, please consult: http://wiki.okfn.org/ckan/related.

In the future we plan to make the data available directly as RDF, using the voiD vocabulary.

How is the diagram generated? #

The diagram is based on metadata from the lodcloud group on CKAN.

In order to generate the diagram, we access CKAN via the CKAN API to get JSON for each of the data sets in the lodcloud group. We then automatically generate a new OmniGraffle file which contains the last version of the LOD cloud and an unsorted bunch of all new datasets.

Those are manually arranged by their cluster membership to form a beautiful and fluffy cloud. Data set names are either taken from the provided title in CKAN or if given the shortname. If the shortname is still too long, we manually tweak it in OmniGraffle.

PDF and PNG versions are exported from OmniGraffle. The SVG is generated from the OmniGraffle using a byzantine collection of home-grown Ruby and JS scripts.

I want to customize the diagram. Can I get the source file? #

The license allows modifications. We don't share the OmniGraffle sources, but the PDF and SVG versions can be edited using appropriate software. The SVG is suitable for manipulation through scripting. The raw data is also available.

When will you update the diagram? #

We update the diagram every few months. Ask us if you need a more precise answer.

Can I get the older versions? #

Yes.

WhiteColoredDatasets
Latest png pdf svg png pdf svg 295
2011-09-19 png pdf svg png pdf svg 295
2010-09-22 png pdf svg png pdf svg 203
2009-07-14 png pdf svg 95
2009-03-27 png pdf svg png pdf svg 93
2009-03-05 png pdf svg png pdf svg 89
2008-09-18 png pdf svg 45
2008-03-31 png pdf svg 34
2008-02-28 png pdf svg 32
2007-11-10 png pdf svg 28
2007-11-07 png 28
2007-10-08 png 25
2007-05-01 png 12

What exactly does the diagram mean? #

The image shows datasets that are published in Linked Data format and are interlinked with other dataset in the cloud.

The size of the circles corresponds to the number of triples in each dataset. The numbers are ususally provided by the dataset publishers, and are sometimes rough estimates.

Circle sizeTriple count
Very large>1B
Large1B-10M
Medium10M-500k
Small500k-10k
Very small<10k

The arrows indicate the existence of at least 50 links between two datasets. A link, for our purposes, is an RDF triple where subject and object URIs are in the namespaces of different datasets.

The direction of the arrows indicate the dataset that contains the links, e.g., an arrow from A to B means that dataset A contains RDF triples that use identifiers from B. Bidirectional arrows usually indicate that the links are mirrored in both datasets. The thickness corresponds to the number of links.

Arrow thicknessTriple count
Thick>100k
Medium100k-1k
Thin<1k

Here are some similar or related efforts that visualise the Web of Data on a high level.