Linked Data Finland
Living Laboratory Data Service for the Semantic Web
This site is the Living Laboratory of the Linked Data Finland research initiative, conducted by the Semantic Computing Research Group at Aalto University in collaboration with University of Helsinki and a large consortium of Finnish public organizations and companies.
Our goal is to make life easier for both publishers as well as consumers of structured data on the Web. We base our work on the Linked Data paradigm and stack of standards, which combines an expressive, semantic data model (RDF) with standardized access mechanisms (SPARQL and live HTTP URIs).
5-star Linked Data
The baseline of our work is the 5-star Linked Data model, proposed originally by Tim Berners-Lee.
★ | Make data available on the Web in whatever format. |
★★ | Make data available as structured data (e.g., Excel instead of an image scan of a table). |
★★★ | Use non-proprietary formats (e.g., CSV instead of Excel format). |
★★★★ | Use URIs to denote things, so that people can point at your data. |
★★★★★ | Link your data to other data to provide context. |
7-star Linked Data Service
However, in our opinion, providing 5-star Linked Data is just the beginning. To actually make use of the datasets, consumers need more support in getting to know and access them, as well as a better grasp of their quality and provenance. To this end, we extend the model with two additional stars:
★★★★★★ | Provide your data with a schema and documentation so that people can understand and re-use your data easily. |
★★★★★★★ | Validate your data and denote its provenance so that people can trust the quality of your data. |
This added support should come with as little extra work as possible to the data publisher. Our hypothesis is that a lot of this can be done automatically, basing on the Linked Data core. A data publisher needs only to provide their data in the RDF format, and the LDF.fi portal will do the rest automatically. See the overview paper (in ESWC 2014 Proceedings, Springer-Verlag) for some more details about the underlying ideas.
Further Information
On the left, you find more information about the project and the datasets we are working on. By selecting a dataset, more information about the services related to the selected dataset can be found. Notice that it is also possible to publish your own data at the service. Otherwise, in the following, you can read up in detail on the breadth of services that can be tied to a dataset. First presented are services to data consumers, followed by those for data publishers. Finally, services provided to other computer systems are presented.
Services Offered to Data Consumers
To test our general hypothesis, we have engaged with multiple communities to discover the services they need in order to 1) evaluate a dataset for fitness of purpose and 2) to use the dataset efficiently. Then, we have appropriated or developed tools on top of the LDF.fi platform to provide those services.
For getting to know a dataset, the LDF.fi service provides the following services:
- From schema definition data, a human-readable documentation of the schema can be generated automatically by the LODE tool (e.g. the void-ext schema)
- For other data, we've created the vocab.at tool to generate human-readable documentation of how the schema is actually used in the dataset (e.g. in this dataset of WW1 events from the Imperial War Museum). As a first means to evaluate dataset quality, the service also provides a report on how well the dataset schema adheres to best practices for technical publication.
- Complementing vocab.at, a more visual overview of a dataset can be glanced in the Aether tool, which, given a dataset, is able to create and visualize a statistical description of the shape of that data. (e.g. the same dataset of WW1 events). As regards data quality, the tool can be used to highlight outliers and peculiarities in the data (see e.g. the examples listed at the end of the page here).
- For visualizing the structural overlap between multiple datasets in order to evaluate ease of integration, the V2 tool can be used.
- For visualizing datasets or parts of them as graphs, the RDF Grapher tool can be used.
- Aside from these, any dataset can also merely be browsed and searched by an end user, as supported by the tools presented next.
For end users of a dataset, the LDF.fi service provides the following services:
- General browsing and search access is provided by the SAHA tool (e.g. the Finnish national epic of Kalevala)
- For datasets that are to be used as reference vocabularies, the Skosmos tool provides a targeted querying experience (e.g. the Finnish joint ontology KOKO)
- The RelFinder tool can be used to discover how two objects relate to each other (e.g. what connects Milan and London in the 18th Century?)
- Faceted search inside a dataset is made possible by a SPARQL widget (e.g. for MuseumFinland)
- A dataset can be used to provide context in a contextual reader application under development (e.g. for reading WW1 primary sources)
- For people knowing SPARQL, further tools are available (in future research, it is hoped to provide these functionalities without the need for such knowledge):
- For manually running SPARQL queries, the YASGUI interface is provided (e.g. for querying which units of the German 3rd Army committed most atrocities in Belgium in the First World War)
- Results from SPARQL queries can be visualized using the VISU tool (e.g. here, places publishing disproportionate amounts of French philosophy in the 18th century fall on the right side of the diagonal)
- VISU is also able to pass query results on to Europeana4D for visualization (e.g. to explore some battles of WW1 in space and time)
- Results from a SPARQL query can also be loaded into the Palladio visualization tool (there is an example here)
- A dataset can be enriched via reasoning based on RDFS and OWL semantics with the OWL RL Reasoner tool, or with N3 rules with the N3 Logic Rule Reasoner tool
Most often however, customized end-user applications have been created, basing on the common APIs and using common modules, but tuned to the needs of particular use cases. For example:
- The BookSampo portal, developed in association with Finnish public libraries, provides access to fiction literature in the Finnish language.
- The The HealthFinland portal, developed in association with the Finnish National Institute for Health and Welfare.
- The CultureSampo portal, which provides unified viewpoints into the heterogeneous collections of over 20 Finnish cultural heritage institutions
- The MuseumFinland portal integrates collections from three Finnish museums
- The BirdWatch mobile application supports amateur ornithologist in submitting quality bird observations based on already gathered data
- The TourRDF demo highlights how linked cultural tour data can be visualized on a map interface, while the POI Finder demonstrates how such tours could be built using Linked Data.
Services Offered to Data Publishers
In addition to end users, we also support data publishers in converting their datasets to RDF, as well as in maintaining them and ensuring their quality.
For converting legacy datasets into RDF for publication in LDF.fi, either the Karma tool or the RDF export extension to OpenRefine can be used. Alternatively, people with technical experience may choose to utilize e.g. an RML processor alongside a mapping definition described using the RDF Mapping Language.
For editing data stored in LDF.fi, the SAHA tool also provides an editing interface, used successfully in production by both the University of Colorado on the WW1LOD dataset as well as dozens of volunteer librarians in the BookSampo project. People with a more direct understanding of RDF may also make use of the Snapper tool to edit their data.
As regards data quality, the vocab.at and Aether tools for evaluating datasets serve not only end users, but have also enabled content publishers themselves to discover errors and abnormalities in their data. Also, the RDF Grapher tool can be used to validate the syntax of RDF data, and to visually spot errors in the relations between data objects. The OWL RL Reasoner and N3 Logic Rule Reasoner tools can be used for discovering logical inconsistencies in the datasets.
For converting an RDF dataset from one format (e.g. Turtle, RDF/XML) to another, the RDF Serializer tool can be used. Similar service for converting OWL ontologies from one format (e.g. Turtle, OWL Functional Syntax) to another is OWL Syntax Converter.
Finally, one of the promises of using Linked Data techniques for publishing datasets is to make those datasets easier to integrate with other data. This relies on either using the same globally unique URI identifiers for items, or in creating mappings between the identifiers used in different datasets. To help publishers in this task, the following services are provided:
- For discovering mappings between datasets already in RDF, the SILK tool can be used.
- For mapping references to shared URIs in legacy data, the reconciliation functionalities of OpenRefine can be used. Here, each dataset inside LDF.fi is able to masquerade as a reconciliation API to be used by OpenRefine (e.g. reconciling Bryce, James against the WW1LOD dataset).
- For integrating URI lookup into legacy database systems, either the Finto API or a ready-made javascript autocompletion widget can be used.
Services Offered to Other Computer Systems
We also support intelligent computer systems in automatically evaluating and accessing the datasets of LDF.fi. Most of the access mechanisms to the datasets provide data in RDF given suitable Accept-headers. For example, both the vocab.at and the Aether tool store the descriptions they generate as RDF, and the SPARQL endpoints of the datasets provide a description of themselves in RDF when queried by a computer system (e.g. http://ldf.fi/ww1lod/sparql). Naturally, all dataset URIs are able to provide their content as RDF (e.g. http://ldf.fi/ww1lod/a74d369d). Complete graphs and datasets are also available for download at their URIs (e.g. http://ldf.fi/ww1lod/iwm/).