WebIndex Launch

Today it is the official launch of the Web Index. Last months I have collaborated, through my activities in WESO Research Group, with the Web Foundation to promote its statistical data following the Linked Data principles. I think we have published an appropriate version of this data and I hope to continue this fruitful collaboration with my new colleagues in next months.

You can find more information about the Web Index as Linked Data in http://data.webfoundation.org/.

If you have any comment, suggestion, etc. please feel free to contact me at any time,

Best,

Logic and Social Choice Theory

I just finished an introduction to social choice theory as the formal study of mechanisms for collective decision making. The article presents all the required background to start in this discipline. I reached this paper through the website of UVA and I consider that the concepts presented in the paper could be applied to a lot of use cases, for instance matchmaking of organizations and public contracts or maybe to mediation in Linked Data. Now I am also very interested in works related to “The Wisdom of Crowds” and, on the best of my knowledge, this kind of research and tools can help to validate the results generated by a crowd. There are also other relevant concepts and definitions I did not know (in this context) such as: dictatorship, liberalism, positive responsiveness or manipulation among others . Furthermore, author presents somo logic concepts that have emerged to support social choice theory with a formal syntax and semantics. Thus there are approches based on FOL, HOL and others trying to formalize the underlying concepts of this theory. I would like to highlight the “Doctrinal Paradox” that makes me feel uncomfortable with existing methods of judging being that using a “premise-based procedure” or a “conclusion-based procedure” to get a decision, the final result can change although both procedures are correct!

If you are interested in this kind of work or research, you can follow the Autumm Course about Computational Social Choice teached by Ulle Endriss as part of a Logics master.

Keep in touch!

 

Re-reasoning starting…

Last days I have read a lot of papers about different topics such as FOL, inference, artificial intelligence, large scale reasoning, rule engines, etc. I have collected all these references in ROCAS wiki with the objective of saving all relevant works that can help me to finally develop our semantic reasoner for large datasets.

This afternoon I have found a paper entitled as “Making Web-Scale Semantic Reasoning More Service- Oriented: The Large Knowledge Collider” that presents the whole architecture of the well-know project LarKc, in some sense the ROCAS project was inspired by this European project but with a restricted scope and different objectives. This paper has two main points for me:

  1. One of the authors is Zhisheng Huang who helped me in 2004 to use his Distributed Logic Programming system to animate humanoids when I was developing my final degree project to get the Bachelor Degree.
  2. From a research point of view, authors present a compilation of works made during the execution of the LarKc project that should be relevant to ROCAS. It is not a research paper but a good summary of this project.

This is a short post but I want to highlight that the world is small enough to meet same people (in this case researchers) again and again! It is incredible! 🙂

Finally, I would also like to report my first progresses in the development. I have deployed a job in Hadoop to perform the classical graph-algorithm “Breadth-First Search”, this is one of the tries I am thinking about for performing reasoning tasks…the other approches can be summarized:

  1. Distribute Jena rule engine (reusing source code)
  2. Develop from scratch the typical backward chain engine using unification and resolution
  3. Mix of 1 and 2 to avoid parsing rules, matching triples, etc.
  4. Build a graph (rules in backward chain can be shown as an AND/OR tree) and try to infer new facts using unification and search.

Let’s rock it!

HPC-Europe2 visiting

In February I made an application to the HPC-Europa2 Transnational Access programme and I finally got a grant that enabled the opportunity of using the SARA infrastructure to test some algorithms. Thanks to the Professor Maarten de Rijke I could select the University of Amsterdam as host so…Now I am here for 6 weeks at the University of Amsterdam in the Institute for Informatics and more specifically in the Intelligent Systems Lab. I am very excited with this opportunity and I will do my best to get a good version of our reasoning prototype. Besides I would like to start a fruitful collaboration between people in this lab and our research group through publications, projects or whatever.

I would also like to thank all the administrative staff of UVA their time and consideration. When I arrived, last Thursday, in 15 minutes I had a visiting card, a desktop and WI-FI connection and a great sight…

I will keep you informed!

CFP: Data Mining on Linked Data workshop with Linked Data Mining Challenge

To be held during the 20th Int. Symposium on Methodologies of Intelligent Systems, ISMIS 2012, 4-7 December 2012 Macao (http://www.fst.umac.mo/wic2012/ISMIS/)

Official CFP

Workshop Scope

Over the past 3 years, the Semantic Web activity has gained momentum with the  widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into a very promising candidate for addressing one of the biggest challenges in the area of intelligent information management: the exploitation of the Web as a platform for data and information integration in addition to document search. Although numerous workshops and even Challenges already emerged in the intersection of Data Mining and Linked Data (e.g. Know@LOD at ESWC) and even Challenges have been organized (USEWODs at WWW, http://data.semanticweb.org/usewod/2012/challenge.html), the particular setting chosen (with a highly topical Government-related dataset) will allow to explore new methods of exploiting Linked Data with state-of-the-art mining tools.

Workshop Topic

The workshop consists of an Open Track and of a Challenge Track.
The Open Track will expect submission of regular research papers, describing novel approaches to applying Data Mining techniques on the Linked Data sources.

Participation in the Challenge Track will require the participants to download a real-world RDF dataset from the domain of Public Contract Procurement, and accomplish at least one of the four pre-defined tasks on it using their own or publicly available data mining tool. To get access to the data, participants have to register to the Challenge Track at http://keg.vse.cz/ismis2012. Partial mapping to external datasets will also be available, which will allow for extraction of further potential features from the Linked Open Data cloud. Task 1 will amount to unrestricted discovery of interesting nuggets in the (augmented) dataset. Task 2 will be similar but the category of interesting hypotheses will be partially specified. Task 3 will concern prediction of one of the features natively present in training data (but only added to the evaluation dataset after the result submission). Task 4 will concern prediction of a feature manually added to a sample of the data by a team of domain experts.
Participants will submit textual reports (Challenge Track papers) and, for Tasks 3 and 4, also the classification results.

Submissions

Both the research papers (submitted to the Open Track) and the Challenge Track papers should follow the Springer formatting style. The templates for Word and LaTeX are available at the workshop web http://keg.vse.cz/ismis2012 and can be also found at http://www.springer.com/authors/book+authors?SGWID=0-154102-12-417900-0. The length of the submission should not exceed 10 pages. All papers will be made available at the workshop web pages and there will be a post-conference proceedings for selected workshop papers.

Papers (and results for Tasks 3 and 4) should be submitted using the EasyChair http://www.easychair.org/conferences/?conf=ismis2012dmold .

Important Dates

  • Data ready for download: June 20, 2012
  • Workshop paper and result data submissions: August 10, 2012
  • Notification of Workshop paper acceptance: August 25, 2012
  • Workshop: December 4, 2012

 

CFP: Intellectual Capital Strategy Management for Knowledge-based Organizations

Call for Chapters

  • Proposals Submission Deadline: April 30, 2012
  • Full Chapters Due: June 15, 2012

Introduction

Strategy management addresses the forces and causes that explain performance differences between organisations. One approach studies industry structures as external determinants of organisational performance. An alternative approach focuses on internal resources and capabilities as sources of sustained competitive advantage. This is the resource and capabilities theory of the firm. On the other hand, the knowledge-based view of the firm considers the firm as a repository of knowledge-based resources and capabilities. To the extent that these resources and capabilities are unique, rare, difficult to imitate and non-substitutable, they confer sustained competitive advantage on the firm. Thus, in these approaches, organisational performance differences are a result of different stocks of knowledge-based resources and capabilities.

On the other hand, intellectual capital literature focuses on the measurement of companies/organizations/regions knowledge base. It also deals with building guidelines for the development of ”intellectual capital accounts”, a corporate report to inform about firms’ stock of knowledge-based resources.

Objectives of the book

The topics of Organizational Learning (OL), Knowledge Management (KM) and Intellectual Capital (IC) are receiving increased interest both from the academic community and companies because of the influence of innovation and learning on the achievement of a competitive advantage for the firm in the New Economy. Literature on knowledge management and intellectual capital suggests that competitive advantage flows from the creation, ownership, protection, storage and use of certain knowledge-based organisational resources. Superior organizational performance depends on firms/organizations/regions ability to be good at innovation, learning, protecting, deploying, amplifying and measuring this strategic intellect.

The objective of book is to bring together a selection of new perspectives that collectively articulate a knowledge-based view of strategy management. It adopts a knowledge-based view that considers the role companies, organizations and nations in the nurturing, deployment, storage and measurement of their knowledge.

The book aims to understand how policies and strategies for the management of knowledge-based resources (human capital, relational capital, structural capital) can contribute to the creation of a competitive advantage not only for companies and institutions but also for nations and economic regions. The adequate design of human resource management policies (“make”/”buy” policies), the strategic design and implementation of strategies for managing knowledge and learning in companies as well as actions take to measure and report on knowledge management can make a difference and create a long term competitive advantage of companies, nations and regions.

Topics of interest for the book

The book will be structured into 5 main sections:

  • Section 1. Intellectual Capital in Companies and Organization
  • Section 2. Intellectual Capital Reports: New Trends in publishing
  • Section 3. Knowledge Management in Practice: Lessons to learn
  • Section 4. Human Resource Management Policies: Strategies to manage knowledge in companies and organizations
  • Section 5. Challenges for the Management of Knowledge-based Resources.

Topics include, but are not limited to, the following:

  • Cases studies on human resource management
  • Economic development of nations and regions
  • Knowledge management theory
  • Knowledge creation
  • Knowledge-based resources
  • Human resource policies
  • “Make”/ “Buy” human resource systems
  • Human capital
  • Relation capital
  • Structural capital
  • Innovation
  • Organizational learning
  • Organizational unlearning

Submission Procedure

Chapter proposals (2 pages) are expected by April 30, 2012. Full chapters (20-25 pages) are expected to be submitted by June 15, 2012. All submitted chapters will be reviewed on a double-blind review basis. Contributors may also be requested to serve as reviewers for this project.

Publisher

This book is scheduled to be published by IGI Global (formerly Idea Group Inc.), publisher of the “Information Science Reference” (formerly Idea Group Reference) and “Medical Information Science Reference” imprints. For additional information regarding the publisher, please visit www.igi-global.com. This publication is anticipated to be released in 2013.

Important Dates

  • April 30, 2012 Book Chapter Submission
  • June 15, 2012 Full Chapter Submission
  • July 1, 2012 Review Results to Authors
  • July 15, 2012 Revised Chapter Submission
  • September 1, 2012 Final Acceptance Notifications

Inquiries and submissions can be forwarded electronically (Word document):

Patricia Ordóñez de Pablos
University of Oviedo, Spain
E-mail: patriop@uniovi.es

With CC to:
Robert D. Tennyson
University of Minnesota, USA
Email: rtenny@umn.edu

Jingyuan Zhao
University of Québec at Montréal, Canada
Email: jingyzh@gmail.com

CFP: Focussed Topic Issue on “New trends on E-Procurement applying Semantic Technologies”

Call for Papers: “Special issue on New trends on E-Procurement applying Semantic Technologies”

Overview

The aim of this special issue is to collect innovative and high-quality research and industrial contributions regarding E-Procurement processes that can fulfill the needs of this new realm. This special issue aims at exploring the recent advances in the application of Semantic Technologies in the E-Procurement sector soliciting original scientific contributions in the form of theoretical, experimental and real research and case studies.

Important dates and Timeline

  • 15 of July 2012, for abstracts (send directly to the guest editors)
  • 1st of September, 2012 for invitations sent to authors to submit full paper
  • 28 Feb 2013, full papers due (submit only to the Elsevier Editorial System)
  • 1st of May 2013, first round of reviews
  • 1st of July 2013, revised papers due (also submitted to the EES)
  • 1st of September 2013, second round of reviews,
  • 1st of November 2013, final papers due (also in EES)
if you need additional information, please contact guest editors:
  •  Jose María Alvarez Rodríguez (Assistant Professor, University of Oviedo), Department of Computer Science, Faculty of Science, University of Oviedo, C/Calvo Sotelo, S/N, 33003, Oviedo, Asturias, Spain, E-mail: josem.alvarez@weso.es
  • José Emilio Labra Gayo (Associate Professor, University of Oviedo), Department of Computer Science, Faculty of Science, University of Oviedo, C/Calvo Sotelo, S/N, 33003, Oviedo, Asturias, Spain, E-mail: labra@uniovi.es
    http://www.di.uniovi.es/~labra
  • Patricia Ordoñez de Pablos (Associate Professor,  University of Oviedo), Department of Business Management, School of Economics and Business, University of Oviedo, Campus del Cristo – Avda. del Cristo, s/n 33006 Oviedo, Asturias, Spain, E-mail: patriop@uniovi.es

Hulu’s Recommendation System

Following with the review of existing recommending systems in multimedia sites I have found through Marcos Merino the recomendation engine provide by HULU (it is an online video service that offers a selection of hit shows, clips, movies and more).

It brings together a large selection of videos from over 350 content companies, including FOX, NBCUniversal, ABC, The CW, Univision, Criterion, A&E Networks, Lionsgate, Endemol, MGM, MTV Networks, Comedy Central, National Geographic, Digital Rights Group, Paramount, Sony Pictures, Warner Bros., TED and more. (Hulu, About)

But, which is the underlying technology in Hulu?

Checking the technological blog they have spent a lot of effort to provide a great recommending engine in which they have decided to recommend shows to users instead of individual videos. Thus, contents can be organized due to same shows videos are usually closely related. As well as Netflix one of the drivers of the recommendation is the user behavior data (implicit and explicit feedback). The algorithm implemented in Hulu is based on a collaborative filtering approach (user or item based) but the most important part lies in Hulu’s architecture which is comprised of the next components:

  1. User profile builder
  2. Recommendation core
  3. Filtering
  4. Ranking
  5. Explanation
Besides they have an off-line system for data processing that supports aforementioned processes and it is based on a data center, a topic model, a related table generator, a feedback analyzer and a report generator. According to these components and processes they have been applied an item-based collaborative filtering algorithm to make recommendations. One of the keypoints to evaluate recommendations is “Novelty”:
Just because a recommendation system can accurately predict user behavior does not mean it produces a show that you want to recommend to an active user. (Hulu, Tech Blog)

Other key points of their approach lies in explanation-based diversity and temporal diversity. This situation demonstrates that existing problems of recommending information resources in different domains are always similar. Nevertheless, depending on the domain (user behavior, type of content, etc.) new metrics can emerge such as novelty. On the other hand, real time capabilities, off-line processing and performance are again key-enablers of a “good” recommendation engine apart from accuracy. Following some interesting lessons from Hulu’s experience are highlighted:

  • Explicit Feedback data is more important than implicit feedback data
  • Recent behaviors are much more important than old behaviors
  • Novelty, Diversity, and offline Accuracy are all important factors
  • Most researchers focus on improving offline accuracy, such as RMSE, precision/recall. However, recommendation systems that can accurately predict user behavior alone may not be a good enough for practical use. A good recommendation system should consider multiple factors together. In our system, after considering novelty and diversity, the CTR has improved by more than 10%. Please check this document out: “Automatic Generation of Recommendations from Data: A Multifaceted Survey” (a technical report from the School of Information Technology at Deakin | University Australia)
But, in which components or processes semantic technologies can help recommenders?
Taking into account the main drivers of the semantic web, the use of semantics can be part of some processes (Mining Data Semantics-MDS2012) such as:
  • Classification and prediction in heterogeneous networks
  • Pattern-analysis methods
  • Link mining and link prediction
  • Semantic search over heterogeneous networks
  • Mining with user interactions
  • Semantic mining with light-weight reasoning
  • Extending LOD and Quality of LOD disambiguation, identity, provenance, integration
  • Personalized mining in heterogeneous networks
  • Domain specific mining (e.g., Life Science and Health Care)
  • Collective intelligence mining
Finally, I will continue reviewing main recommendation services of large social networks (sort by name) such as Amazon, Facebook, Foursquare, Linkedin, Mendeley or Twitter to finally make a complete comparison according to different variables and features of the algorithms: feedback, real time, domain, user behavior, etc. After that my main objective will be make an implementation of a real use case in a distributed environment merging semantic technologies and recommendation algorithms to demonstrate if semantics can improve results (accuracy, etc.) of existing approaches.

BellKor Solution to the Netflix Prize

Currently users are inundated with information and data coming from products and services. Recommending systems are an emerging research area from the last years but with a huge importance in any commercial application. A simple classification of these techniques lies in pushing, user-user or item-user based recommendations neighborhood models, simple matrix factorization model or latent models. The main of challenge of improving these techniques is addressed to get more accurate models in which information with regards to resources biases, user biases and user preferences biases are taken into account.

Collaborative filtering is a prime component to recommend products and services. Basically, the neighborhood approach and latent factor mode models (such as Singular Value Decomposition-SVD)  are the main approaches to easy comparisons. First ones are focused on computing relationships between items or users while the second ones translate all items to the same latent factor space thus them directly comparable.

After this short review of main approaches of collaborative filtering, I am going to focus on the subject of this post “The BellKor Solution to the Netflix Prize” [1], it is a contest to improve the accuracy of the Cinematch algorithm using the quality metric “RMSE” with a prize up to 1M $. The authors (Bob Bell and Chris Volinsky, from the Statistics Research group in AT&T Labs, and Yehuda Koren), of this algorithm have won the prize with the first approach that merges both models (neighborhood  and SVD) getting a more accurate model. Some of the main features of this approach lies in:

  •  a new model for neighborhood recommenders based on optimizing a global cost function keeping advantages such as explainability and handling new users while  improving accuracy
  • a set of extensions to existing SVD models to integrate implicit feedback and time features

Thus a new approach for recommending systemswas presented in 2008-2009 (a complete description of the algorithm is available in the article “Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model“) to win the Netflix Contest but some open issues are still open:

  • Scalability (millions of users and items) and Real time (map/reduce techniques to continuouslyprocess new data)
  • Explainability
  • Implicit and explicit feedback
  • Factorization techniques (please read this article from the same authors)
  • Quality including more data with regards to dates, attributes of users, etc.
  • …in general recommender systems are a young area in which a lot of improvements can be implemented

Finally, it is relevant to check last publications of Yehuda Koren:

Technologies Cloud

After seven years I got expertise in some research domains and technologies…I believe a cloud can properly explain it!