in Research

Compiling Related Work about Linked Data Quality

One of the cornerstones to boost the use of Linked Data is to ensure the quality of data according to different terms like timely, correctness, etc. The intrinsic features of this initiative provide a framework for the distributed publication of data and resources (linking together datasources on the web). Due to this open approach some mechanisms should be added to check if data is well linked or it is just a try to link together some part of the web. Most of the cases of linking data use an automatic way to discover and create links between resources (e.g. Silk Framework), this situation implies that the process is, in some factors, ambiguous so human decision is required. In the case of the data, the quality may vary as information providers have different levels of knowledge, objectives, etc. Thus information and data are released in order to accomplish a specific task and their quality should be assessed depending on different criteria according to a specific domain.

For instance, a data provider is releasing information about payments, is it possible to check which is the decimal separator, 10,000 or 10.000? is this information homogenous across all resources in the dataset?. If a literal value should be “Oviedo”, what happen if the real value is “Obiedo”? How we can detect and fix these situations?

These cases have motivated some related work:

  • The PhD thesis of Christian Bizer that purposes a template language and a framework (WIQA) to detect if a triple fulfills the requirements to be accepted in a dataset. (2007)
  • LODQ vocabulary, is a RDF model to express criteria about 15 kind of metrics that have been formulated by Glenn McDonald in a mailing list. A processor of this vocabulary is still missing. (2011)
  • A paper entitled “Linked Data Quality Assessment through Network Analysis” by Christian Gueret, in which some metrics are provided to check the quality of links. This work is part of  the LATC project.  (2011)
  • The workshop COLD (Consuming Linked Data) is also a good start point to check problems and approaches to deal with the requirements of implementing linked data applications.
  • …that are collected in the aforementioned works.
In some sense we should think that this problem is new but the truth is that it is inherited from the traditional databases. One of the arising questions is the possibility of applying existing approaches to solve the assessment of quality in the linked data realm…but this will be evaluated in next posts.
This first post is just a short introduction to the linked data quality research and approaches. In next weeks, we try to review in depth these works and purpose a solution (LODQAM).
Thank you very much!
Excellent regards,

Write a Comment

Comment