Compiling Related Work about Linked Data Quality

One of the cornerstones to boost the use of Linked Data is to ensure the quality of data according to different terms like timely, correctness, etc. The intrinsic features of this initiative provide a framework for the distributed publication of data and resources (linking together datasources on the web). Due to this open approach some mechanisms should be added to check if data is well linked or it is just a try to link together some part of the web. Most of the cases of linking data use an automatic way to discover and create links between resources (e.g. Silk Framework), this situation implies that the process is, in some factors, ambiguous so human decision is required. In the case of the data, the quality may vary as information providers have different levels of knowledge, objectives, etc. Thus information and data are released in order to accomplish a specific task and their quality should be assessed depending on different criteria according to a specific domain.

For instance, a data provider is releasing information about payments, is it possible to check which is the decimal separator, 10,000 or 10.000? is this information homogenous across all resources in the dataset?. If a literal value should be “Oviedo”, what happen if the real value is “Obiedo”? How we can detect and fix these situations?

These cases have motivated some related work:

  • The PhD thesis of Christian Bizer that purposes a template language and a framework (WIQA) to detect if a triple fulfills the requirements to be accepted in a dataset. (2007)
  • LODQ vocabulary, is a RDF model to express criteria about 15 kind of metrics that have been formulated by Glenn McDonald in a mailing list. A processor of this vocabulary is still missing. (2011)
  • A paper entitled “Linked Data Quality Assessment through Network Analysis” by Christian Gueret, in which some metrics are provided to check the quality of links. This work is part of  the LATC project.  (2011)
  • The workshop COLD (Consuming Linked Data) is also a good start point to check problems and approaches to deal with the requirements of implementing linked data applications.
  • …that are collected in the aforementioned works.
In some sense we should think that this problem is new but the truth is that it is inherited from the traditional databases. One of the arising questions is the possibility of applying existing approaches to solve the assessment of quality in the linked data realm…but this will be evaluated in next posts.
This first post is just a short introduction to the linked data quality research and approaches. In next weeks, we try to review in depth these works and purpose a solution (LODQAM).
Thank you very much!
Excellent regards,

Editor & Reviewer

 As part of my work as researcher I also participate as editor, reviewer and program commite of several journals and conferences.

Other research works

These works are related to the research projects I am involved in.

  1. MOLDEAS: Methods On Linked Data for E-procurement Applying Semantics. PhD Dissertation (Spanish).  Supervised by: Dr. José Emilio Labra Gayo. [PDF | 12 MB]
  2. TKM (Technological Knowledge Management) Ontology. ORIGIN Project. Jose María Álvarez Rodríguez (UNIOVI), Pablo Abella Vallina (UNIOVI) and José Emilio Labra (UNIOVI). December 2010.
  3. Interoperability and Integration in Service Oriented Architecture Applying Semantics. Research Work. Supervised by: Dr. Antonio Manuel Campos López. September 2009.
  4. D7.2 Dissemination Strategy. V1. ONTORULE Project. Jose María Álvarez Rodríguez (CTIC), Lian Shi (CTIC), Diego Berrueta Muñoz (CTIC), Luis Polo Paredes (CTIC), Antonio Campos López(CTIC), Koen Van Leuwen (PNA), Thomas Eiter (TUWIEN), Roman Korf (Ontoprise), Emmanuelle Passot (IBM). June 2009.
  5. D3.1 Especificación de la arquitectura de la plataforma genérica SILO. PRIMA Project. Jose María Álvarez Rodríguez (CTIC), Miguel García Rodríguez  (CTIC)
    Diego Berrueta Muñoz (CTIC) and Luis Polo Paredes (CTIC). Confidential. April 2009.
  6. D1.1 Especificación de requisitos y arquitectura de alto nivel, v1.0. PRIMA Project. Jose María Álvarez Rodríguez (CTIC), Miguel García Rodríguez  (CTIC)
    Diego Berrueta Muñoz (CTIC) and Luis Polo Paredes (CTIC). Confidential. December 2008.
  7. Technical Report. PRAVIA Project . Jose María Álvarez Rodríguez (CTIC), Miguel García Rodríguez  (CTIC) Diego Berrueta Muñoz (CTIC) and Luis Polo Paredes (CTIC). Confidential. December 2007.
  8. D8.1 Evaluation of the semantic web services platforms. PRAVIA project. Jose María Álvarez Rodríguez (CTIC). Confidential. August 2007.

Research projects

Due to my activities as computer science researcher I am (was) involved in several research projects of the main research programmes at regional, national and European scope. Following a list of the most relevant projects is presented:

European scope

  1. AHTools(Arrowhead Tools for Engineering of Digitalisation Solutions). H2020-ECSEL (Grant Agreement No 826452). 2019-2022. Main Researcher: Jose María Alvarez-Rodríguez. See Press release.
  2. NewControl (Integrated, Fail-Operational, Cognitive Perception, Planning and Control Systems for Highly Automated Vehicles). H2020-ECSEL (selected). 2019-2022. Main Researcher: Anabel Fraga. See Press release.
  3. Internet of DevOps. Celtic+. 2018-2021. Main Researcher: Jose María Alvarez-Rodríguez.
  4. IMOLA II-Interoperability using knowledge management among land registry domain in Europe. JUST AG-2016-05 / GRANT AGREEMENT Nº 764350. Main Researcher:  Dra. Anabel Fraga.
  5. Cross-Nature: Cross Harmonization & Exploitation of NATURE DataSets. INEA/CEF/ICT/A2016/1297261. 2017-2019. Main Researcher: Javier García Guzmán.
  6. AMASS (Architecture-driven, Multi-concern and Seamless Assurance and Certification of Cyber-Physical Systems). H2020-ECSEL. 2016-2019. Main Researcher: Dr. Jose Luis de la Vara
  7. CRitical sYSTem engineering AcceLeration (CRYSTAL). ARTEMIS Call 2012. 2013-2016. Main Researcher:  Dra. Anabel Fraga
  8. Trans-European Research Training Network on Engineering and Provisioning of Service-Based Cloud Applications. FP7 Marie Curie Initial Training Network “RELATE”. 7º Framework Programme. 2013-2015.
  9. Multilingual Web Thematic Network. 7º Framework Programme. ICT PSP n.º 250500. Researcher. 2011. Coordinator: W3C.
  10. ONTORULE-ONTOlogies meet business RULEs. 7º Framework Programme. FP7-ICT-2007.4.4, ref. 231875. Leader of workpackage 7: “Dissemination and Training” . 2009-2011. Main Researcher: IBM France.
  11. Others: Celtic+ MyMobileWeb.

National scope

  1. ENERSI-Plataforma Servicios Energéticos A Partir De Datos De Múltiples Fuentes Integrados. Spanish Ministry of  Economy and Industry.  2014-2017. Main Researcher: Prof. Dr. Juan Miguel Gómez Berbís
  2. ROCAS: Reasoning On the Cloud Applying Semantics. Spanish Ministry of Science and Innovation. TIN2011-27871. 2012. Main Researcher: Dr. José Emilio Labra (University of Oviedo).
  3. 10ders Information Services. Plan Avanza 2. Spanish Ministry of Industry, Tourism and Commerce. TSI-020100-2010-919. 2011. Main Researcher: Gateway S.C.S.
  4. ORIGIN: ORganizaciones Inteligentes Globales INnovadoras. Centre for Industrial Technological Development. 2010-2012. Main Researcher: Indra Software Labs.
  5. Spanish Thematic Network of Linked Data. Spanish Ministry of Science and Innovation. TIN2010-10811-E. 2011. Coordinator: Dr. Óscar Corcho (UPM).
  6. PRIMA: Plataforma de Recursos de Información y Movilidad para el sector Asegurador. Plan Avanza I+D. Spanish Ministry of Industry, Tourism and Commerce. TSI-020302-2008-032. 2008-2010. Main Researcher: Telefónica R&D.
  7. VULCANO. Plan Avanza I+D. Spanish Ministry of Industry, Tourism and Commerce. 350503-2007-7. 2007-2009. Main Researcher: ATOS Origin.
  8. MORFEO-EzWeb. Plan Avanza I+D. Spanish Ministry of Industry, Tourism and Commerce. TSI-340503-2007-02. 2007-2009. Main Researcher: Telefónica R&D.
  9. MyMobileWeb (CELTIC project). Plan Avanza I+D. Spanish Ministry of Industry, Tourism and Commerce. TSI-020301-2008-25. 2007-2009. Main Researcher: Telefónica R&D.
  10. MORFEO-MyMobileWeb (CELTIC project). Plan Avanza I+D. Spanish Ministry of Industry, Tourism and Commerce. TSI-350400-2006-20. 2006-2007. Main Researcher: Telefónica R&D.

Regional scope

  1. ORBITA (New Approaches to Linked Data Visualization). Regional Plan of Science, Technology and Innovation, Asturias 2006-2009. 2012-2013. Main Researcher: Treelogic S.L.
  2. RETINAS (REal TIme video ANalysis for Security applications). Regional Plan of Science, Technology and Innovation, Asturias 2006-2009. 2011-2013. Main Researcher: Treelogic S.L.
  3. SAITA. Regional Plan of Science, Technology and Innovation, Asturias 2006-2009. cod. 208001. 2008-2009. Main Researcher: Felguera TI.
  4. PRAVIA: Plataforma de Recursos de Acceso Virtual a la Información del Sector Asegurador. Regional Plan of Science, Technology and Innovation, Asturias 2006-2009. cod. IE05-172. 2007. Main Researcher: E2000 Nuevas Tecnologías.
  5. BOPA. Principado de Asturias, Estrategia e-Asturias 2007. cod. 06-096. 2005-2006. Main Researcher: Fundación CTIC.

Others: private contracts, research agreements, etc.

  1. Social Media Radar. RTVE. 2017-2018. Main Researcher: Dra. Anabel Fraga.
  2. “Functional Analysis of a stock management system”. Felguera TI. 2017. Main Researcher: Dr. Jose María Alvarez-Rodríguez
  3. P2PMONEY. Plataforma para el intercambio de divisas basado en modelo de transformación digital . Vector Software SL. 2017-2019. Main Researcher: Prof. Dr. Juan Miguel Gómez Berbís.
  4. Diseño y desarrollo de motor semántico. FUNDACIÓN PARA LA INVESTIGACIÓN BIOMÉDICA DEL HOSPITAL UNIVERSITARIO RAMÓN Y CAJAL. 2016-2018. Main Researcher: Prof. Dr. Juan Llorens.
  5. Research Agreement  (“Cátedra de Investigación”) between RTVE and UC3M to boost research in the field of Big Data, Linked Data, Complex network analysis, Natural Language Processing, etc. applied to social network analysis. (October 2015-now). Main Researcher: Prof. Dr. Juan Llorens
  6. Research Agreement (“Cátedra de Investigación”) between TRC and UC3M to boost research in the field of systems engineering and interoperability. (January 2017-now). Main Researcher:  Dra. Anabel Fraga
  7. Magnus: Big data como soporte a la toma de decisiones legales. Tirant LoBlanch SL. 2015-2016. Main Researcher:  Prof. Dr. Juan Llorens
  8. The Webindex project. Publishing statistical data following the Linked Open Data Principles. 2012.
  9. Freews. On-line platform for distributing, sharing and searching video news applying semantic technologies. 2012. Main Researcher: José Emilio Labra Gayo.
  10. Linked Data Portal of the Library of Congress of Chile. 2012. Main Researcher: José Emilio Labra Gayo.

University

  1. “ONE-MINUTE QUIZZ: retroalimentación continua y en tiempo real de conceptos básicos de ingeniería del
    software mediante cuestionarios sencillo.” UNIVERSIDAD CARLOS III DE MADRID. 2017. Main Researcher: Prof. Dr. Gonzalo Génova Fuster.
  2. Metodología Lego® SeriousPlay® para mejorar la participación y consolidar el aprendizaje de conceptos.”. UNIVERSIDAD CARLOS III DE MADRID. 2016. Main Researcher: Prof. Dr. Maribel Sánchez-Segura.
  3. Assessing the educational quality by means of processing surveys applying an automatic psychological evaluation system. UNIVERSIDAD EUROPEA DE MADRID-2012 UEM 19.  Main Researcher: Agustín Martínez Molina. 2013.