Week #3 in Thessaloniki

This week I will be updating this post because I am reading a lot of papers and I need a way to track them. Following the same structure of last weeks I leave some links to the activities I am carrying out:

Reading

I have focused on some interesting subjects Statistics (Bayesian networks), Data Streams, Feedback Control Loops, Autonomous Computing and e-Learning systems (this is just for personal interest). I have started, and finished, the next list of papers and books:

Open-Loop Feedback Control of Nonlinear Stochastic Systems Based on Deterministic Dirac Mixture Densities
Probabilistic framework for opportunistic spectrum management in cognitive ad hoc networks
Probabilistic Framework for Sensor Management (Book)
A Probabilistic Approach to Mixed Open-loop and Closed-loop Control, with Application to Extreme Autonomous Driving
When cloud computing meets with Semantic Web: A new design for e-portfolio systems in the social media era
Sharing innovative teaching experience in higher education on the Web. An interdisciplinary study on a contextualized Web 2.0 application for community building and teacher training
Crash Course on Data Stream Algorithms (Slides)
Data Streams: Models and Algorithsm
Data Streams: Algorithms and Applications
Data Streams Algorithms
A Survey of Graph Algorithms in Extensions to the Streaming Model of Computation
Big Data versus the Crowd: Looking for Relationships in All the Right Places
Elementary: Large-scale Knowledge-base Construction via Machine Learning and Statistical Inference
Hazy: Making it Easier to Build and Maintain Big-data Analytics
Expert finding systems (some articles from the next Google Search)
Expertise ranking algorithms (some articles from the next Google Search), these two last activities are just for own-interest.
Bayesian Networks
- Bayesian networks [1, 2](in Spanish)
- Bayesian networks in R
- Machine Learning & Statistical Learning, R packages
- Causality: Models, Reasoning, and Inference

Writing and reviewing

I have started with the structure and firts contents of two papers and one special issue proposal.
I have also completed a workshop proposal for the 9th ACM International Conference in Cloud and Autonomic Computing.
I have finished the review of a paper for the Journal Current Topics in Medicinal Chemistry (impact factor 4.174)
I have finished the review of a paper for the International Conference on Marketing – Challenging Environment (ICOM 2013)
I have also reviewed two papers of our Special Issue (just an initial review)

I would like to leave the link to an article about “How to review a paper“, an excellent guide to evaluate your reviews and take into account your responsibilities as reviewer.

Coding and Tools

I have not made any relevant progress in developing tasks but I have being refreshing my know-how on R.

Métodos Estadísticos con R y R Commander (Spanish guide)
Estadística Básica con R y R–Commander (Spanish basic statistics method)
R-intro (Spanish version)
R for Beginners
Test of Rome Java library (RSS and Atom Utilities for Java)
Weka (Data mining Java library)

Teaching

I have finished the evaluation of alumni in Health Information Systems and I am very proud of the marks and the work carried out by student during the last months. I have some links of their works building mashups but I prefer do not leave here the links due to privacy issues.

WebIndex Launch

Today it is the official launch of the Web Index. Last months I have collaborated, through my activities in WESO Research Group, with the Web Foundation to promote its statistical data following the Linked Data principles. I think we have published an appropriate version of this data and I hope to continue this fruitful collaboration with my new colleagues in next months.

You can find more information about the Web Index as Linked Data in http://data.webfoundation.org/.

If you have any comment, suggestion, etc. please feel free to contact me at any time,

Best,

R & Big Data Intro

I am very committed to enhance my know-how on delivering solutions deal with Big Data in a high-performance fashion. I am continuously seeking for tools, algorithms, recipes (e.g. Data Science e-book), papers and technology to enable this kind of processing because it is consider to be relevant in next years but it is now a truth!

Last week I was restarting the use of R and the rcmdr to analyze and extract statistics out from my phd experiments using the Wilcoxon Test. I started with R three years ago when I developed a simple graphical interface in Visual Studio to input data and request operations to the R interpreter, the motivation of this work was to help a colleague with his final degree project and the experience was very rewarding.

Which is the relation between Big Data and R?

It has a simple explanation, a key-enabler to provide added-value services is to manage and learn about historical logs so putting together an excellent statistics suite and the Big Data realm it is possible to answer the requirements of a great variety of services from domains like nlp, recommendation, business intelligence, etc. For instance, there are approaches to mix R with Hadoop such as RICARDO or Parallel R and new companies are emerging to offer services based on R to process Big Data like Revolution Analytics.

This post was a short introduction to R as a tool to exploit Big Data. If you’re interested in this kind of approaches, please take a look to next presentation by Lee Edfelsen:

Scalable Data Analysis in R Webinar Presentation

View more presentations from Revolution Analytics

Keep on researching and learning!

Nomenclátor Asturias 2010

DEPRECATED: NEED TO BE UPDATED, See:

This dataset created by the SADEI contains information about the populated places of my area, Asturias, including:

Codes to identify the type of a populated place: CC/PP/EE (C: code of first level division called “Concejo”, P: code of second level division called Parroquia Rural and EE: code of third level division the real place)
Name in Spanish and Asturian
Statistics about: altitude, distance, area, men, women and number of apartments (main and not main)

The structure of places is a hierarchy of 3 levels: Concejo (Municipality), Parroquia rural and others like: city, town, suburb, etc. Depending on the type of place some statistics are missing and their values are indicated with a value of “-1”. For instance “Concejo” and “Parroquia Rural” do not have “altitude and distance” and third level places do not have “area”.

Anyway all this information is publicly available via the WESO SPARQL endpoint (5 star linked data) and a Pubby frontend (more information about the dataset can be found in nomenclator-asturias dataset at thedatahub.org) . The structure of the data and definitions is the next one:

Noménclator definitions. Graph IRI: http://purl.org/weso/nomenclator/definitions . Total: 101 triples. Example at: http://purl.org/weso/nomenclator/ontology/Concejo
- Dump file (Turtle) (6.3 KB)
Noménclator populated places dataset. Graph IRI: http://purl.org/weso/nomenclator/asturias/2010. Total: 60,196 triples. Example at: http://purl.org/weso/nomenclator/asturias/2010/resource/53/00/00

Scheme: http://purl.org/weso/nomenclator/asturias/2010/resource/ds
Dump file (Turtle) (2.7 MB)

Noménclator statistics definitions. Graph IRI: http://purl.org/weso/nomenclator/stats/ontology. Total: 68 triples. Example at: http://purl.org/weso/nomenclator/stats/ontology/physicaldata

Dump file (Turtle) (3.5 KB)

Noménclator statistics dataset. Graph IRI: http://purl.org/weso/nomenclator/asturias/2010/stats. Total: 370,160 triples.

Dump file (Turtle) (27 MB)
Area: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/physicaldata/area/53/00/00
Altitude: Example at: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/physicaldata/altitude/53/08/02
Distance: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/physicaldata/distance/53/08/02
Men: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/sex/m/53/08/02
Women: http://purl.org/weso/nomenclator/asturias/2010/stats/resource/sex/f/53/08/02
Apartment (main): http://purl.org/weso/nomenclator/asturias/2010/stats/resource/apartment/main/53/08/02
Apartment (not main): http://purl.org/weso/nomenclator/asturias/2010/stats/resource/apartment/notmain/53/08/02

Example of query: “Give me all municipalities that have more women than men”

The definitions have been made using the vocabularies:

The whole dataset uses links to other datasets (126,127):

1 link to NUTS
78 links to DBPedia one per each “Concejo”
78,859 links to DBPedia, one per each populated place and observation
55,146 links to Reference Data Gov UK, one per each populated place and observation
70,904 links to SDMX attributes (sex-m and sex-f)
29 links to GeoLinkedData.es

Continue reading →