Some leisure time hacking…

After some time of working in a new public spending effort with my colleagues from the National Technical University of Athens, a Linked Open Data portal of public spending has been released. I have contibuted in the part of unifying company names and linking product scheme classifications, neverthless they have performed a very good job promoting to the Linked Open Data initiative a lot of public contracts metadata. See the next video to have an idea of the work:

Publicspending.net from John Fidias on Vimeo.

Hope to continue this fruitful collaboration!

Travelling to Lima and visiting UPC

Last week, I was invited by Prof. Carlos Raymundo of UPC to give a talk at the University and to also participate in some of the courses they teach there. I am very thankful to the UPC team for the excellent organization, they created a fantastic environment to collaborate with students and other people attending the conference. I am going to leave here my presentation (it is in Spanish):

Researching Semantic Web-Overview from Jose María Alvarez

On the other hand, I took advantage of visiting this great country to take a look around Lima and I could take some pictures, you can find the album at Flickr.

I hope we can stablish new collaborations and work together in some initiatives related to semantic web, linked data and research/innovation in general,

I will come back soon!

Interview about the Webindex

Last Tuesday I was interviewed by RPA about the participation of the WESO Research Group in this project. It was my first apparition in the radio and I believe the result was pretty good. You can listen the audio here. Moreover our head Labra was interviewed in other media such as newspaper and other radios and these contents can be accesed via the official communication portal at the University of Oviedo.

On the other hand and following with the Webindex it is my intention to provide data and methods to calculate values using R. I think this is a real way to open and encourage the reutilization of data, at least in statistics.

Finally, I am also involved in my distributed reasoner and preparing some proposals to mix computational linguistics, computational social choice theory, etc. I will update the blog with my outcomes. I also started this week to teach “Software Design” with my colleagues Benjamín López, César Acebal and Raúl Izquierdo, I am very excited with this topic because I love design patterns, software architectures, etc. I hope to do my best!

Keep in touch!

HPC-Europe2 visiting

In February I made an application to the HPC-Europa2 Transnational Access programme and I finally got a grant that enabled the opportunity of using the SARA infrastructure to test some algorithms. Thanks to the Professor Maarten de Rijke I could select the University of Amsterdam as host so…Now I am here for 6 weeks at the University of Amsterdam in the Institute for Informatics and more specifically in the Intelligent Systems Lab. I am very excited with this opportunity and I will do my best to get a good version of our reasoning prototype. Besides I would like to start a fruitful collaboration between people in this lab and our research group through publications, projects or whatever.

I would also like to thank all the administrative staff of UVA their time and consideration. When I arrived, last Thursday, in 15 minutes I had a visiting card, a desktop and WI-FI connection and a great sight…

I will keep you informed!

PhD Presentation

PhD Presentation

View more presentations from Jose María Alvarez

CFP: Focussed Topic Issue on “New trends on E-Procurement applying Semantic Technologies”

Call for Papers: “Special issue on New trends on E-Procurement applying Semantic Technologies”

Computers in Industry. An International, Application Oriented Research Journal (Impact Factor: 1.620)

Overview

The aim of this special issue is to collect innovative and high-quality research and industrial contributions regarding E-Procurement processes that can fulfill the needs of this new realm. This special issue aims at exploring the recent advances in the application of Semantic Technologies in the E-Procurement sector soliciting original scientific contributions in the form of theoretical, experimental and real research and case studies.

Important dates and Timeline

15 of July 2012, for abstracts (send directly to the guest editors)
1st of September, 2012 for invitations sent to authors to submit full paper
28 Feb 2013, full papers due (submit only to the Elsevier Editorial System)
1st of May 2013, first round of reviews
1st of July 2013, revised papers due (also submitted to the EES)
1st of September 2013, second round of reviews,
1st of November 2013, final papers due (also in EES)

if you need additional information, please contact guest editors:

Jose María Alvarez Rodríguez (Assistant Professor, University of Oviedo), Department of Computer Science, Faculty of Science, University of Oviedo, C/Calvo Sotelo, S/N, 33003, Oviedo, Asturias, Spain, E-mail: josem.alvarez@weso.es
José Emilio Labra Gayo (Associate Professor, University of Oviedo), Department of Computer Science, Faculty of Science, University of Oviedo, C/Calvo Sotelo, S/N, 33003, Oviedo, Asturias, Spain, E-mail: labra@uniovi.es
http://www.di.uniovi.es/~labra
Patricia Ordoñez de Pablos (Associate Professor, University of Oviedo), Department of Business Management, School of Economics and Business, University of Oviedo, Campus del Cristo – Avda. del Cristo, s/n 33006 Oviedo, Asturias, Spain, E-mail: patriop@uniovi.es

Product Scheme Classifications

Following with the activities performed to promote the CPV as a linked dataset we have finished the first beta release of new product scheme classifications (PSCs) as linked data in the context of e-procurement. Next diagram shows the ongoing work in the transformation of PSCs (gray ones are not yet transformed):

The process to promote all these PSCs (more information can be found in pscs-catalogue at thedatahub.org) have been carried out in a stepwise method (similar to http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook):

Select the PSCs to be transformed and download the datasource (MSExcel in most of cases)
Model the information about a PSC item using existing vocabularies. If it is required new concepts and relations can be defined such as in CPV case. URI design.
Transform the data using Google Refine
Create the mappings between a PSC and the Product Ontology (custom java-based reconciliator adapted to the descriptions of PSCs items)
Create the mappings between a PSC and the CPV 2008 (custom java-based reconciliator between a source PSC and a target PSC)
Validate mappings and links
Add dataset descriptions using VoID vocabulary
Store in Virtuoso and publish data with Pubby

The definition of a PSC item (?product) is comprised of the next properties:

URI for datasets: http://purl.org/weso/pscs/{psc}/{year|version}/resource/ds
URI for resources: http://purl.org/weso/pscs/{psc}/{year|version}/resource/{id}
URI for classes and properties: http://purl.org/weso/pscs/{psc}/{year|version}/ontology/
rdf:type <pscs:PSCConcept> (rdf:type skos:Concept)
dcterms:identifier “id” (the id that is part of the URI)
skos:notation “raw id” (the real id that appears in the data source)
skos:prefLabel, gr:description and rdfs:label “description”
skos:inScheme <void:Dataset>, <skos:ConceptScheme>
skos:broaderTransitive/skos:narrowerTransitive <PSCConcept> (in some cases the broader of an item can not be inferred using the codes, in that case we have defined a custom property called “pscs:level“)
pscs:relatedMatch (mapping between ?product and items of ProductOntology). The next release will include a “confidence” value to stablish the weight of matchings.
skos:exactMatch <PSCConcept> (some PSCs have already defined mappings among them, we reuse this information)
skos:closeMatch <PSCConcept> (mapping between ?product and items of CPV 2008). The next release will include a “confidence” value to stablish the weight of matchings.

The whole linkset of PSCs can be found at http://purl.org/weso/pscs/ and we have also extracted out some statistics (PSC void:Dataset, IRI graph and triples):

http://purl.org/weso/pscs/cn/2012/resource/ds, http://purl.org/weso/pscs/cn/2012, 137,484
http://purl.org/weso/pscs/cpa/2008/resource/ds, http://purl.org/weso/pscs/cpa/2008, 92,749
http://purl.org/weso/pscs/cpc/2008/resource/ds, http://purl.org/weso/pscs/cpc/2008, 100,819
http://purl.org/weso/pscs/cpv/2003/resource/ds, http://purl.org/weso/pscs/cpv/2003, 546,135
http://purl.org/weso/pscs/cpv/2008/resource/ds, http://purl.org/weso/pscs/cpv/2008, 803,311
http://purl.org/weso/pscs/isic/v4/resource/ds, http://purl.org/weso/pscs/isic/v4, 18,986
http://purl.org/weso/pscs/naics/2007/resource/ds, http://purl.org/weso/pscs/naics/2007, 36,292
http://purl.org/weso/pscs/naics/2012/resource/ds, http://purl.org/weso/pscs/naics/2012, 35,390
http://purl.org/weso/pscs/sitc/v4/resource/ds, http://purl.org/weso/sitc/v4, 70,887

Try this query: “Give me 100 products or services related to ‘construction’ in any PSC that have a mapping with products or services in CPV 2008 (descriptions in English)”

Continue reading →

Old-Fasioned Common Procurement Vocabulary 2008 and 2003

The Common Procurement Vocabulary (CPV) establishes a single classification system for public procurement aimed at standardising the references used by contracting authorities and entities to describe the subject of procurement contracts.

The CPV consists of a main vocabulary for defining the subject of a contract, and a supplementary vocabulary for adding further qualitative information. The main vocabulary is based on a tree structure comprising codes of up to 9 digits (an 8 digit code plus a check digit) associated with a wording that describes the type of supplies, works or services forming the subject of the contract.

The main vocabulary is based on a tree structure comprising codes of up to nine digits associated with a wording that describes the supplies, works or services forming the subject of the contract.

The first two digits identify the divisions (XX000000-Y);
The first three digits identify the groups (XXX00000-Y);
The first four digits identify the classes (XXXX0000-Y);
The first five digits identify the categories (XXXXX000-Y);

Each of the last three digits gives a greater degree of precision within each category. A ninth digit serves to verify the previous digits.

The supplementary vocabulary may be used to expand the description of the subject of a contract. The items are made up of an alphanumeric code with a corresponding wording allowing further details to be added regarding the specific nature or destination of the goods to be purchased.

The alphanumeric code is made up of:

a first level comprising a letter corresponding to a section;
a second level comprising four digits, the first three of which denote a subdivision and the last one being for verification purposes

(Information available at: http://simap.europa.eu/codes-and-nomenclatures/codes-cpv/codes-cpv_en.htm)

The dataset created is comprised of CPV 2008 and CPV 2003 codes and the mappings between them. All this information is publicly available via the WESO SPARQL endpoint (5 star linked data) and a Pubby frontend. The structure of the data and definitions is the next one:

CPV 2008. Graph IRI: Graph IRI: http://purl.org/weso/cpv/2008. Total: 556,335
triples.
- Scheme: http://purl.org/weso/cpv/2008/scheme
- Dump file (Turtle) (25 MB)
- Division: http://purl.org/weso/cpv/2008/03000000
- Group: http://purl.org/weso/cpv/2008/03100000
- Class: http://purl.org/weso/cpv/2008/03110000
- Category: http://purl.org/weso/cpv/2008/03111000 | http://purl.org/weso/cpv/2008/03111100
- Mapping example:

http://purl.org/weso/cpv/2008/03111100

http://purl.org/weso/cpv/definitions/codeIn2003

http://purl.org/weso/cpv/2003/01113100

CPV 2003. Graph IRI: Graph IRI: http://purl.org/weso/cpv/2003. Total: 191,430
triples. http://purl.org/weso/cpv/2003/01113100
- Scheme: http://purl.org/weso/cpv/2003/scheme
- Dump file (Turtle) (7.8 MB)
CPV Definitions. Graph IRI: Graph IRI: http://purl.org/weso/cpv/definitions. Triples: 43
- Dump file (Turtle) (7,4 KB)

The definitions have been made using the vocabularies:

The whole dataset uses links to other datasets (28,839):

GoodRelations and Product Ontology products and descriptions

In order to create all this data we have used different tools:

Google Refine and the RDF extension (to produce data)
Pubby (to publish data)
OpenLink Virtuoso (to store data)

Collaborators:

José Emilio Labra (Main Researcher of WESO Research Group at the University of Oviedo)
The first version of the CPV was developed in conjunction with my colleagues of CTIC: Luis Polo and Emilio Rubiera in 2007.

Acknowledgements:

This work is part of MOLDEAS system developed by the WESO Research Group in the partnership project 10ders Information Services project partially funded by the Spanish Ministry of Industry, Tourism and Trade with code TSI-020100-2010-919 and the European Regional Development Fund (EFDR) according to the National Plan of Scientific Research, Development and Technological Innovation 2008-2011, leaded by Gateway Strategic Consultancy Services and developed in cooperation with Exis-TI.

TO DO List

Check broken links
Review the design of URIs
Create Named graphs to group different divisions/groups/classes/categories
Link to other datasets
Reconciliate all products and services with the DBPedia resources
Develop a GUI based on Exhibit, SNORQL, etc.
Send this dataset and statistics to the Linked Data Cloud
Update public procurement notices with the new URIs

Nomenclátor Asturias 2010

DEPRECATED: NEED TO BE UPDATED, See:

This dataset created by the SADEI contains information about the populated places of my area, Asturias, including:

Codes to identify the type of a populated place: CC/PP/EE (C: code of first level division called “Concejo”, P: code of second level division called Parroquia Rural and EE: code of third level division the real place)
Name in Spanish and Asturian
Statistics about: altitude, distance, area, men, women and number of apartments (main and not main)

The structure of places is a hierarchy of 3 levels: Concejo (Municipality), Parroquia rural and others like: city, town, suburb, etc. Depending on the type of place some statistics are missing and their values are indicated with a value of “-1”. For instance “Concejo” and “Parroquia Rural” do not have “altitude and distance” and third level places do not have “area”.

Anyway all this information is publicly available via the WESO SPARQL endpoint (5 star linked data) and a Pubby frontend (more information about the dataset can be found in nomenclator-asturias dataset at thedatahub.org) . The structure of the data and definitions is the next one:

Noménclator definitions. Graph IRI: http://purl.org/weso/nomenclator/definitions . Total: 101 triples. Example at: http://purl.org/weso/nomenclator/ontology/Concejo
- Dump file (Turtle) (6.3 KB)
Noménclator populated places dataset. Graph IRI: http://purl.org/weso/nomenclator/asturias/2010. Total: 60,196 triples. Example at: http://purl.org/weso/nomenclator/asturias/2010/resource/53/00/00

Scheme: http://purl.org/weso/nomenclator/asturias/2010/resource/ds
Dump file (Turtle) (2.7 MB)

Noménclator statistics definitions. Graph IRI: http://purl.org/weso/nomenclator/stats/ontology. Total: 68 triples. Example at: http://purl.org/weso/nomenclator/stats/ontology/physicaldata

Dump file (Turtle) (3.5 KB)

Noménclator statistics dataset. Graph IRI: http://purl.org/weso/nomenclator/asturias/2010/stats. Total: 370,160 triples.

Dump file (Turtle) (27 MB)
Area: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/physicaldata/area/53/00/00
Altitude: Example at: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/physicaldata/altitude/53/08/02
Distance: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/physicaldata/distance/53/08/02
Men: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/sex/m/53/08/02
Women: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/sex/f/53/08/02
Apartment (main): http://purl.org/weso/nomenclator/asturias/2010/stats/resource/apartment/main/53/08/02
Apartment (not main): http://purl.org/weso/nomenclator/asturias/2010/stats/resource/apartment/notmain/53/08/02

Example of query: “Give me all municipalities that have more women than men”

The definitions have been made using the vocabularies:

The whole dataset uses links to other datasets (126,127):

1 link to NUTS
78 links to DBPedia one per each “Concejo”
78,859 links to DBPedia, one per each populated place and observation
55,146 links to Reference Data Gov UK, one per each populated place and observation
70,904 links to SDMX attributes (sex-m and sex-f)
29 links to GeoLinkedData.es

Continue reading →