Logic and Social Choice Theory

I just finished an introduction to social choice theory as the formal study of mechanisms for collective decision making. The article presents all the required background to start in this discipline. I reached this paper through the website of UVA and I consider that the concepts presented in the paper could be applied to a lot of use cases, for instance matchmaking of organizations and public contracts or maybe to mediation in Linked Data. Now I am also very interested in works related to “The Wisdom of Crowds” and, on the best of my knowledge, this kind of research and tools can help to validate the results generated by a crowd. There are also other relevant concepts and definitions I did not know (in this context) such as: dictatorship, liberalism, positive responsiveness or manipulation among others . Furthermore, author presents somo logic concepts that have emerged to support social choice theory with a formal syntax and semantics. Thus there are approches based on FOL, HOL and others trying to formalize the underlying concepts of this theory. I would like to highlight the “Doctrinal Paradox” that makes me feel uncomfortable with existing methods of judging being that using a “premise-based procedure” or a “conclusion-based procedure” to get a decision, the final result can change although both procedures are correct!

If you are interested in this kind of work or research, you can follow the Autumm Course about Computational Social Choice teached by Ulle Endriss as part of a Logics master.

Keep in touch!

Re-reasoning starting…

Last days I have read a lot of papers about different topics such as FOL, inference, artificial intelligence, large scale reasoning, rule engines, etc. I have collected all these references in ROCAS wiki with the objective of saving all relevant works that can help me to finally develop our semantic reasoner for large datasets.

This afternoon I have found a paper entitled as “Making Web-Scale Semantic Reasoning More Service- Oriented: The Large Knowledge Collider” that presents the whole architecture of the well-know project LarKc, in some sense the ROCAS project was inspired by this European project but with a restricted scope and different objectives. This paper has two main points for me:

One of the authors is Zhisheng Huang who helped me in 2004 to use his Distributed Logic Programming system to animate humanoids when I was developing my final degree project to get the Bachelor Degree.
From a research point of view, authors present a compilation of works made during the execution of the LarKc project that should be relevant to ROCAS. It is not a research paper but a good summary of this project.

This is a short post but I want to highlight that the world is small enough to meet same people (in this case researchers) again and again! It is incredible! 🙂

Finally, I would also like to report my first progresses in the development. I have deployed a job in Hadoop to perform the classical graph-algorithm “Breadth-First Search”, this is one of the tries I am thinking about for performing reasoning tasks…the other approches can be summarized:

Distribute Jena rule engine (reusing source code)
Develop from scratch the typical backward chain engine using unification and resolution
Mix of 1 and 2 to avoid parsing rules, matching triples, etc.
Build a graph (rules in backward chain can be shown as an AND/OR tree) and try to infer new facts using unification and search.
…

Let’s rock it!

HPC-Europe2 visiting

In February I made an application to the HPC-Europa2 Transnational Access programme and I finally got a grant that enabled the opportunity of using the SARA infrastructure to test some algorithms. Thanks to the Professor Maarten de Rijke I could select the University of Amsterdam as host so…Now I am here for 6 weeks at the University of Amsterdam in the Institute for Informatics and more specifically in the Intelligent Systems Lab. I am very excited with this opportunity and I will do my best to get a good version of our reasoning prototype. Besides I would like to start a fruitful collaboration between people in this lab and our research group through publications, projects or whatever.

I would also like to thank all the administrative staff of UVA their time and consideration. When I arrived, last Thursday, in 15 minutes I had a visiting card, a desktop and WI-FI connection and a great sight…

I will keep you informed!

Review of “Mining of Massive Datasets”

Finally, I finished the reviewed of this excellent book about mining massive datasets. The sheer mass of data on the web is continuosly growing, a lot of new methods, algorithms and tools are emerging in order to deal with this big amount of data but in some cases without providing a formal model to process the information. In this book, authors present a compilation of the most used algorithms (and its formal definition) to build recommendation systems based on data mining techniques.

I strongly recommend the reading of this book because it focuses on data mining of very large amounts of data that does not fit in main memory. Currently this situation can be applied to the management of digital libraries, analysis of social networks, bioinformatics, etc. in which the processing of large datasets is necessary. The main topics can be shown in the next figures but according to authors you will learn the next concepts:

Distributed file systems and map/reduce approaches a a tool for creating parallel algorithms
Methods to estimate and calculate similarity search
Processing of data streams with specializaed algorithms
Technology of existing search engines: page rank, link-spam detection, etc.
Frequent-itemset mining
Algorithms for clustering very large and high-dimensional datasets
Two main applications of these techniques: advertising and recommendation systems

Nevertheless, I miss a section about real time processing of large amounts data instead of streaming techniques.

Hulu’s Recommendation System

Following with the review of existing recommending systems in multimedia sites I have found through Marcos Merino the recomendation engine provide by HULU (it is an online video service that offers a selection of hit shows, clips, movies and more).

It brings together a large selection of videos from over 350 content companies, including FOX, NBCUniversal, ABC, The CW, Univision, Criterion, A&E Networks, Lionsgate, Endemol, MGM, MTV Networks, Comedy Central, National Geographic, Digital Rights Group, Paramount, Sony Pictures, Warner Bros., TED and more. (Hulu, About)

But, which is the underlying technology in Hulu?

Checking the technological blog they have spent a lot of effort to provide a great recommending engine in which they have decided to recommend shows to users instead of individual videos. Thus, contents can be organized due to same shows videos are usually closely related. As well as Netflix one of the drivers of the recommendation is the user behavior data (implicit and explicit feedback). The algorithm implemented in Hulu is based on a collaborative filtering approach (user or item based) but the most important part lies in Hulu’s architecture which is comprised of the next components:

User profile builder
Recommendation core
Filtering
Ranking
Explanation

Besides they have an off-line system for data processing that supports aforementioned processes and it is based on a data center, a topic model, a related table generator, a feedback analyzer and a report generator. According to these components and processes they have been applied an item-based collaborative filtering algorithm to make recommendations. One of the keypoints to evaluate recommendations is “Novelty”:

Just because a recommendation system can accurately predict user behavior does not mean it produces a show that you want to recommend to an active user. (Hulu, Tech Blog)

Other key points of their approach lies in explanation-based diversity and temporal diversity. This situation demonstrates that existing problems of recommending information resources in different domains are always similar. Nevertheless, depending on the domain (user behavior, type of content, etc.) new metrics can emerge such as novelty. On the other hand, real time capabilities, off-line processing and performance are again key-enablers of a “good” recommendation engine apart from accuracy. Following some interesting lessons from Hulu’s experience are highlighted:

Explicit Feedback data is more important than implicit feedback data
Recent behaviors are much more important than old behaviors
Novelty, Diversity, and offline Accuracy are all important factors
Most researchers focus on improving offline accuracy, such as RMSE, precision/recall. However, recommendation systems that can accurately predict user behavior alone may not be a good enough for practical use. A good recommendation system should consider multiple factors together. In our system, after considering novelty and diversity, the CTR has improved by more than 10%. Please check this document out: “Automatic Generation of Recommendations from Data: A Multifaceted Survey” (a technical report from the School of Information Technology at Deakin | University Australia)

But, in which components or processes semantic technologies can help recommenders?

Taking into account the main drivers of the semantic web, the use of semantics can be part of some processes (Mining Data Semantics-MDS2012) such as:

Classification and prediction in heterogeneous networks
Pattern-analysis methods
Link mining and link prediction
Semantic search over heterogeneous networks
Mining with user interactions
Semantic mining with light-weight reasoning
Extending LOD and Quality of LOD disambiguation, identity, provenance, integration
Personalized mining in heterogeneous networks
Domain specific mining (e.g., Life Science and Health Care)
Collective intelligence mining
…

Finally, I will continue reviewing main recommendation services of large social networks (sort by name) such as Amazon, Facebook, Foursquare, Linkedin, Mendeley or Twitter to finally make a complete comparison according to different variables and features of the algorithms: feedback, real time, domain, user behavior, etc. After that my main objective will be make an implementation of a real use case in a distributed environment merging semantic technologies and recommendation algorithms to demonstrate if semantics can improve results (accuracy, etc.) of existing approaches.

R & Big Data Intro

I am very committed to enhance my know-how on delivering solutions deal with Big Data in a high-performance fashion. I am continuously seeking for tools, algorithms, recipes (e.g. Data Science e-book), papers and technology to enable this kind of processing because it is consider to be relevant in next years but it is now a truth!

Last week I was restarting the use of R and the rcmdr to analyze and extract statistics out from my phd experiments using the Wilcoxon Test. I started with R three years ago when I developed a simple graphical interface in Visual Studio to input data and request operations to the R interpreter, the motivation of this work was to help a colleague with his final degree project and the experience was very rewarding.

Which is the relation between Big Data and R?

It has a simple explanation, a key-enabler to provide added-value services is to manage and learn about historical logs so putting together an excellent statistics suite and the Big Data realm it is possible to answer the requirements of a great variety of services from domains like nlp, recommendation, business intelligence, etc. For instance, there are approaches to mix R with Hadoop such as RICARDO or Parallel R and new companies are emerging to offer services based on R to process Big Data like Revolution Analytics.

This post was a short introduction to R as a tool to exploit Big Data. If you’re interested in this kind of approaches, please take a look to next presentation by Lee Edfelsen:

Scalable Data Analysis in R Webinar Presentation

View more presentations from Revolution Analytics

Keep on researching and learning!