Product Scheme Classifications

Ongoing PSCs

Following with the activities performed to promote the CPV as a linked dataset we have finished the first beta release of new product scheme classifications (PSCs) as linked data in the context of e-procurement. Next diagram shows the ongoing work in the transformation of PSCs (gray ones are not yet transformed):

The process to promote all these PSCs (more information can be found in pscs-catalogue at thedatahub.org) have been carried out in a stepwise method (similar to http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook):

  1. Select the PSCs to be transformed and download the datasource (MSExcel in most of cases)
  2. Model the information about a PSC item using existing vocabularies. If it is required new concepts and relations can be defined such as in CPV case. URI design.
  3. Transform the data using Google Refine
  4. Create the mappings between a PSC and the Product Ontology (custom java-based reconciliator adapted to the descriptions of PSCs items)
  5. Create the mappings between a PSC and the CPV 2008 (custom java-based reconciliator between a source PSC and a target PSC)
  6. Validate mappings and links
  7. Add dataset descriptions using VoID vocabulary
  8. Store in Virtuoso and publish data with Pubby

The definition of a PSC item (?product) is comprised of the next properties:

  • URI for datasetshttp://purl.org/weso/pscs/{psc}/{year|version}/resource/ds
  • URI for resources: http://purl.org/weso/pscs/{psc}/{year|version}/resource/{id}
  • URI for classes and properties: http://purl.org/weso/pscs/{psc}/{year|version}/ontology/
  • rdf:type <pscs:PSCConcept> (rdf:type skos:Concept)
  • dcterms:identifier “id” (the id that is part of the URI)
  • skos:notation “raw id” (the real id that appears in the data source)
  • skos:prefLabel, gr:description and rdfs:label “description”
  • skos:inScheme <void:Dataset>, <skos:ConceptScheme>
  • skos:broaderTransitive/skos:narrowerTransitive <PSCConcept> (in some cases the broader of an item can not be inferred using the codes, in that case we have defined a custom property called “pscs:level“)
  • pscs:relatedMatch (mapping between  ?product and items of ProductOntology). The next release will include a “confidence” value to stablish the weight of matchings.
  • skos:exactMatch <PSCConcept> (some PSCs have already defined mappings among  them, we reuse this information)
  • skos:closeMatch <PSCConcept> (mapping between ?product and items of CPV 2008). The next release will include a “confidence” value to stablish the weight of matchings.
The whole linkset of PSCs can be found at http://purl.org/weso/pscs/ and we have also extracted out some statistics (PSC void:Dataset, IRI graph and triples):

Continue reading

Old-Fasioned Common Procurement Vocabulary 2008 and 2003

The Common Procurement Vocabulary (CPV) establishes a single classification system for public procurement aimed at standardising the references used by contracting authorities and entities to describe the subject of procurement contracts.

The CPV consists of a main vocabulary for defining the subject of a contract, and a supplementary vocabulary for adding further qualitative information. The main vocabulary is based on a tree structure comprising codes of up to 9 digits (an 8 digit code plus a check digit) associated with a wording that describes the type of supplies, works or services forming the subject of the contract.

The main vocabulary is based on a tree structure comprising codes of up to nine digits associated with a wording that describes the supplies, works or services forming the subject of the contract.

  • The first two digits identify the divisions (XX000000-Y);
  • The first three digits identify the groups (XXX00000-Y);
  • The first four digits identify the classes (XXXX0000-Y);
  • The first five digits identify the categories (XXXXX000-Y);

Each of the last three digits gives a greater degree of precision within each category. A ninth digit serves to verify the previous digits.

The supplementary vocabulary may be used to expand the description of the subject of a contract. The items are made up of an alphanumeric code with a corresponding wording allowing further details to be added regarding the specific nature or destination of the goods to be purchased.

The alphanumeric code is made up of:

  • a first level comprising a letter corresponding to a section;
  • a second level comprising four digits, the first three of which denote a subdivision and the last one being for verification purposes
The dataset created is comprised of CPV 2008 and CPV 2003 codes and the mappings between them. All this information is publicly available via the WESO SPARQL endpoint (5 star linked data) and a Pubby frontend. The structure of the data and definitions is the next one:

http://purl.org/weso/cpv/2008/03111100 

http://purl.org/weso/cpv/definitions/codeIn2003

http://purl.org/weso/cpv/2003/01113100

The definitions have been made using the vocabularies:

The whole dataset uses links to other datasets (28,839):

  • GoodRelations  and Product Ontology products and descriptions

In order to create all this data we have used different tools:

Collaborators:

Acknowledgements:
This work is part of MOLDEAS system developed by the WESO Research Group in the partnership project 10ders Information Services project partially funded by the Spanish Ministry of Industry, Tourism and Trade with code TSI-020100-2010-919 and the European Regional Development Fund (EFDR) according to the National Plan of Scientific Research, Development and Technological Innovation 2008-2011, leaded by Gateway Strategic Consultancy Services and developed in cooperation with Exis-TI.

TO DO List

  • Check broken links
  • Review the design of URIs
  • Create Named graphs to group different divisions/groups/classes/categories
  • Link to other datasets
  • Reconciliate all products and services with the DBPedia resources
  • Develop a GUI based on Exhibit, SNORQL, etc.
  • Send this dataset and statistics to the Linked Data Cloud
  • Update public procurement notices with the new URIs

Nomenclátor Asturias 2010

DEPRECATED: NEED TO BE UPDATED, See:

This dataset created by the SADEI contains information about the populated places of my area, Asturias, including:

  • Codes to identify the type of a populated place: CC/PP/EE (C: code of first level division called “Concejo”, P: code of second level division called Parroquia Rural and EE: code of third level division the real place)
  • Name in Spanish and Asturian
  • Statistics about: altitude, distance, area, men, women and number of apartments (main and not main)

The structure of places is a hierarchy of 3 levels: Concejo (Municipality), Parroquia rural and others like: city, town, suburb, etc. Depending on the type of place some statistics are missing and their values are indicated with a value of “-1″. For instance “Concejo” and “Parroquia Rural” do not have “altitude and distance” and third level places do not have “area”.

Anyway all this information is publicly available via the WESO SPARQL endpoint (5 star linked data) and a Pubby frontend (more information about the dataset can be found in nomenclator-asturias dataset at thedatahub.org) . The structure of the data and definitions is the next one:

  • Noménclator statistics definitions. Graph IRI: http://purl.org/weso/nomenclator/stats/ontology. Total: 68 triples. Example at: http://purl.org/weso/nomenclator/stats/ontology/physicaldata
  • Noménclator statistics dataset. Graph IRI: http://purl.org/weso/nomenclator/asturias/2010/stats. Total: 370,160 triples.
  • Example of query: “Give me all municipalities that have more women than men”

    The definitions have been made using the vocabularies:

    The whole dataset uses links to other datasets (126,127):

    • 1 link to NUTS
    • 78 links to DBPedia one per each “Concejo”
    • 78,859 links to DBPedia, one per each populated place and observation
    • 55,146 links to Reference Data Gov UK, one per each populated place and observation
    • 70,904 links to SDMX attributes (sex-m and sex-f)
    • 29 links to GeoLinkedData.es