Linked Data Finland

Living Laboratory Data Service for the Semantic Web

This site is the Living Laboratory of the Linked Data Finland research initiative, conducted by the Semantic Computing Research Group at Aalto University in collaboration with University of Helsinki and a large consortium of Finnish public organizations and companies.

Our goal is to make life easier for both publishers as well as consumers of structured data on the Web. We base our work on the Linked Data paradigm and stack of standards, which combines an expressive, semantic data model (RDF) with standardized access mechanisms (SPARQL and live HTTP URIs).

5-star Linked Data

The baseline of our work is the 5-star Linked Data model, proposed originally by Tim Berners-Lee.

Make data available on the Web in whatever format.
★★ Make data available as structured data (e.g., Excel instead of an image scan of a table).
★★★ Use non-proprietary formats (e.g., CSV instead of Excel format).
★★★★ Use URIs to denote things, so that people can point at your data.
★★★★★ Link your data to other data to provide context.

7-star Linked Data Service

However, in our opinion, providing 5-star Linked Data is just the beginning. To actually make use of the datasets, consumers need more support in getting to know and access them, as well as a better grasp of their quality and provenance. To this end, we extend the model with two additional stars:

★★★★★★ Provide your data with a schema and documentation so that people can understand and re-use your data easily.
★★★★★★★ Validate your data and denote its provenance so that people can trust the quality of your data.

This added support should come with as little extra work as possible to the data publisher. Our hypothesis is that a lot of this can be done automatically, basing on the Linked Data core. A data publisher needs only to provide their data in the RDF format, and the portal will do the rest automatically. See the overview paper (in ESWC 2014 Proceedings, Springer-Verlag) for some more details about the underlying ideas.

Further Information

On the left, you find more information about the project and the datasets we are working on. By selecting a dataset, more information about the services related to the selected dataset can be found. Notice that it is also possible to publish your own data at the service. Otherwise, in the following, you can read up in detail on the breadth of services that can be tied to a dataset. First presented are services to data consumers, followed by those for data publishers. Finally, services provided to other computer systems are presented.

Services Offered to Data Consumers

To test our general hypothesis, we have engaged with multiple communities to discover the services they need in order to 1) evaluate a dataset for fitness of purpose and 2) to use the dataset efficiently. Then, we have appropriated or developed tools on top of the platform to provide those services.

For getting to know a dataset, the service provides the following services:

For end users of a dataset, the service provides the following services:

Most often however, customized end-user applications have been created, basing on the common APIs and using common modules, but tuned to the needs of particular use cases. For example:

Services Offered to Data Publishers

In addition to end users, we also support data publishers in converting their datasets to RDF, as well as in maintaining them and ensuring their quality.

For converting legacy datasets into RDF for publication in, either the Karma tool or the RDF export extension to OpenRefine can be used. Alternatively, people with technical experience may choose to utilize e.g. an RML processor alongside a mapping definition described using the RDF Mapping Language.

For editing data stored in, the SAHA tool also provides an editing interface, used successfully in production by both the University of Colorado on the WW1LOD dataset as well as dozens of volunteer librarians in the BookSampo project. People with a more direct understanding of RDF may also make use of the Snapper tool to edit their data.

As regards data quality, the and Aether tools for evaluating datasets serve not only end users, but have also enabled content publishers themselves to discover errors and abnormalities in their data. Also, the RDF Grapher tool can be used to validate the syntax of RDF data, and to visually spot errors in the relations between data objects. The OWL RL Reasoner and N3 Logic Rule Reasoner tools can be used for discovering logical inconsistencies in the datasets.

For converting an RDF dataset from one format (e.g. Turtle, RDF/XML) to another, the RDF Serializer tool can be used. Similar service for converting OWL ontologies from one format (e.g. Turtle, OWL Functional Syntax) to another is OWL Syntax Converter.

Finally, one of the promises of using Linked Data techniques for publishing datasets is to make those datasets easier to integrate with other data. This relies on either using the same globally unique URI identifiers for items, or in creating mappings between the identifiers used in different datasets. To help publishers in this task, the following services are provided:

Services Offered to Other Computer Systems

We also support intelligent computer systems in automatically evaluating and accessing the datasets of Most of the access mechanisms to the datasets provide data in RDF given suitable Accept-headers. For example, both the and the Aether tool store the descriptions they generate as RDF, and the SPARQL endpoints of the datasets provide a description of themselves in RDF when queried by a computer system (e.g. Naturally, all dataset URIs are able to provide their content as RDF (e.g. Complete graphs and datasets are also available for download at their URIs (e.g.