Validation

Linked Data Finland

The validation of Linked Data is often neglected due to the distributed nature of RDF and the open world assumption. However, the validation of Linked Data is a crucial factor for the usability and the re-use of the data. Most structured data representations such as, XML or JSON offer ways for domain-specific data validation, like XML Schemas, Schematron or JSON Schemas. The Linked Data is often published without any validation following the publish now and refine later delusion, that hampers the re-use of the data.

Poor quality comes from the bad practices and lack of data discipline. Just like any other data, the Linked Data is no exception, and should always be validated with domain-specific rules to ensure the good quality. The context dependent validation of RDF can be accomplished by using SPIN (SPARQL-based rule and constraint language) with any known triplestore or by using commercial triplestores that implement the data validation.

The datasets published as Linked Data may be constructed from various sources using automated procedures that can make the comprehensibility of the contained data structures difficult even for the data curators. Creating the validation rules for the datasets is only possible if the used vocabulary and potential causes of errors are notices. The validation of datasets published in ldf.fi is done by following two principles, first document then validate. The documentation for the datasets is done with vocab.at service, which will create documentation for the datasets automatically. The rules for the validation are then constructed using the information about the used data structures and the amount of data.