r – Chema's Home Page

This week I will be updating this post because I am reading a lot of papers and I need a way to track them. Following the same structure of last weeks I leave some links to the activities I am carrying out:

Reading

I have focused on some interesting subjects Statistics (Bayesian networks), Data Streams, Feedback Control Loops, Autonomous Computing and e-Learning systems (this is just for personal interest). I have started, and finished, the next list of papers and books:

Open-Loop Feedback Control of Nonlinear Stochastic Systems Based on Deterministic Dirac Mixture Densities
Probabilistic framework for opportunistic spectrum management in cognitive ad hoc networks
Probabilistic Framework for Sensor Management (Book)
A Probabilistic Approach to Mixed Open-loop and Closed-loop Control, with Application to Extreme Autonomous Driving
When cloud computing meets with Semantic Web: A new design for e-portfolio systems in the social media era
Sharing innovative teaching experience in higher education on the Web. An interdisciplinary study on a contextualized Web 2.0 application for community building and teacher training
Crash Course on Data Stream Algorithms (Slides)
Data Streams: Models and Algorithsm
Data Streams: Algorithms and Applications
Data Streams Algorithms
A Survey of Graph Algorithms in Extensions to the Streaming Model of Computation
Big Data versus the Crowd: Looking for Relationships in All the Right Places
Elementary: Large-scale Knowledge-base Construction via Machine Learning and Statistical Inference
Hazy: Making it Easier to Build and Maintain Big-data Analytics
Expert finding systems (some articles from the next Google Search)
Expertise ranking algorithms (some articles from the next Google Search), these two last activities are just for own-interest.
Bayesian Networks
- Bayesian networks [1, 2](in Spanish)
- Bayesian networks in R
- Machine Learning & Statistical Learning, R packages
- Causality: Models, Reasoning, and Inference

Writing and reviewing

I have started with the structure and firts contents of two papers and one special issue proposal.
I have also completed a workshop proposal for the 9th ACM International Conference in Cloud and Autonomic Computing.
I have finished the review of a paper for the Journal Current Topics in Medicinal Chemistry (impact factor 4.174)
I have finished the review of a paper for the International Conference on Marketing – Challenging Environment (ICOM 2013)
I have also reviewed two papers of our Special Issue (just an initial review)

I would like to leave the link to an article about “How to review a paper“, an excellent guide to evaluate your reviews and take into account your responsibilities as reviewer.

Coding and Tools

I have not made any relevant progress in developing tasks but I have being refreshing my know-how on R.

Métodos Estadísticos con R y R Commander (Spanish guide)
Estadística Básica con R y R–Commander (Spanish basic statistics method)
R-intro (Spanish version)
R for Beginners
Test of Rome Java library (RSS and Atom Utilities for Java)
Weka (Data mining Java library)

Teaching

I have finished the evaluation of alumni in Health Information Systems and I am very proud of the marks and the work carried out by student during the last months. I have some links of their works building mashups but I prefer do not leave here the links due to privacy issues.

I am very committed to enhance my know-how on delivering solutions deal with Big Data in a high-performance fashion. I am continuously seeking for tools, algorithms, recipes (e.g. Data Science e-book), papers and technology to enable this kind of processing because it is consider to be relevant in next years but it is now a truth!

Last week I was restarting the use of R and the rcmdr to analyze and extract statistics out from my phd experiments using the Wilcoxon Test. I started with R three years ago when I developed a simple graphical interface in Visual Studio to input data and request operations to the R interpreter, the motivation of this work was to help a colleague with his final degree project and the experience was very rewarding.

Which is the relation between Big Data and R?

It has a simple explanation, a key-enabler to provide added-value services is to manage and learn about historical logs so putting together an excellent statistics suite and the Big Data realm it is possible to answer the requirements of a great variety of services from domains like nlp, recommendation, business intelligence, etc. For instance, there are approaches to mix R with Hadoop such as RICARDO or Parallel R and new companies are emerging to offer services based on R to process Big Data like Revolution Analytics.

This post was a short introduction to R as a tool to exploit Big Data. If you’re interested in this kind of approaches, please take a look to next presentation by Lee Edfelsen:

Scalable Data Analysis in R Webinar Presentation

View more presentations from Revolution Analytics

Keep on researching and learning!