Following with the activities performed to promote the CPV as a linked dataset we have finished the first beta release of new product scheme classifications (PSCs) as linked data in the context of e-procurement. Next diagram shows the ongoing work in the transformation of PSCs (gray ones are not yet transformed):
The process to promote all these PSCs (more information can be found in pscs-catalogue at thedatahub.org) have been carried out in a stepwise method (similar to http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook):
- Select the PSCs to be transformed and download the datasource (MSExcel in most of cases)
- Model the information about a PSC item using existing vocabularies. If it is required new concepts and relations can be defined such as in CPV case. URI design.
- Transform the data using Google Refine
- Create the mappings between a PSC and the Product Ontology (custom java-based reconciliator adapted to the descriptions of PSCs items)
- Create the mappings between a PSC and the CPV 2008 (custom java-based reconciliator between a source PSC and a target PSC)
- Validate mappings and links
- Add dataset descriptions using VoID vocabulary
- Store in Virtuoso and publish data with Pubby
The definition of a PSC item (?product) is comprised of the next properties:
- URI for datasets: http://purl.org/weso/pscs/{psc}/{year|version}/resource/ds
- URI for resources: http://purl.org/weso/pscs/{psc}/{year|version}/resource/{id}
- URI for classes and properties: http://purl.org/weso/pscs/{psc}/{year|version}/ontology/
- rdf:type <pscs:PSCConcept> (rdf:type skos:Concept)
- dcterms:identifier “id” (the id that is part of the URI)
- skos:notation “raw id” (the real id that appears in the data source)
- skos:prefLabel, gr:description and rdfs:label “description”
- skos:inScheme <void:Dataset>, <skos:ConceptScheme>
- skos:broaderTransitive/skos:narrowerTransitive <PSCConcept> (in some cases the broader of an item can not be inferred using the codes, in that case we have defined a custom property called “pscs:level“)
- pscs:relatedMatch (mapping between ?product and items of ProductOntology). The next release will include a “confidence” value to stablish the weight of matchings.
- skos:exactMatch <PSCConcept> (some PSCs have already defined mappings among them, we reuse this information)
- skos:closeMatch <PSCConcept> (mapping between ?product and items of CPV 2008). The next release will include a “confidence” value to stablish the weight of matchings.
- http://purl.org/weso/pscs/cn/2012/resource/ds, http://purl.org/weso/pscs/cn/2012, 137,484
- http://purl.org/weso/pscs/cpa/2008/resource/ds, http://purl.org/weso/pscs/cpa/2008, 92,749
- http://purl.org/weso/pscs/cpc/2008/resource/ds, http://purl.org/weso/pscs/cpc/2008, 100,819
- http://purl.org/weso/pscs/cpv/2003/resource/ds, http://purl.org/weso/pscs/cpv/2003, 546,135
- http://purl.org/weso/pscs/cpv/2008/resource/ds, http://purl.org/weso/pscs/cpv/2008, 803,311
- http://purl.org/weso/pscs/isic/v4/resource/ds, http://purl.org/weso/pscs/isic/v4, 18,986
- http://purl.org/weso/pscs/naics/2007/resource/ds, http://purl.org/weso/pscs/naics/2007, 36,292
- http://purl.org/weso/pscs/naics/2012/resource/ds, http://purl.org/weso/pscs/naics/2012, 35,390
- http://purl.org/weso/pscs/sitc/v4/resource/ds, http://purl.org/weso/sitc/v4, 70,887
The definitions have been made using the vocabularies:
The whole linkset uses links to other datasets (151,102):
- GoodRelations and Product Ontology products and descriptions
In order to create all this data we have used different tools:
- Google Refine and the RDF extension (to produce data)
- Apache Lucene and Solr to reconciliate concepts
- Pubby (to publish data)
- OpenLink Virtuoso (to store data)
Collaborators:
- José Emilio Labra (Main Researcher of WESO Research Group at the University of Oviedo)
- Jose Luis Marín (Euroalert.net)
- The first version of the CPV was developed in conjunction with my colleagues of CTIC: Luis Polo and Emilio Rubiera in 2007.
Note:
The initial version of CPV as linked data is available in order to ensure backward compatibility.
TO DO List
- Example of queries
- Confidence value in mappings
- Check broken links
- Link to other datasets, fix names (case sensitive)
- Reconciliate all products and services with the DBPedia resources
- Update public procurement notices with the new URIs