semantic – Chema's Home Page

Following with the review of existing recommending systems in multimedia sites I have found through Marcos Merino the recomendation engine provide by HULU (it is an online video service that offers a selection of hit shows, clips, movies and more).

It brings together a large selection of videos from over 350 content companies, including FOX, NBCUniversal, ABC, The CW, Univision, Criterion, A&E Networks, Lionsgate, Endemol, MGM, MTV Networks, Comedy Central, National Geographic, Digital Rights Group, Paramount, Sony Pictures, Warner Bros., TED and more. (Hulu, About)

But, which is the underlying technology in Hulu?

Checking the technological blog they have spent a lot of effort to provide a great recommending engine in which they have decided to recommend shows to users instead of individual videos. Thus, contents can be organized due to same shows videos are usually closely related. As well as Netflix one of the drivers of the recommendation is the user behavior data (implicit and explicit feedback). The algorithm implemented in Hulu is based on a collaborative filtering approach (user or item based) but the most important part lies in Hulu’s architecture which is comprised of the next components:

User profile builder
Recommendation core
Filtering
Ranking
Explanation

Besides they have an off-line system for data processing that supports aforementioned processes and it is based on a data center, a topic model, a related table generator, a feedback analyzer and a report generator. According to these components and processes they have been applied an item-based collaborative filtering algorithm to make recommendations. One of the keypoints to evaluate recommendations is “Novelty”:

Just because a recommendation system can accurately predict user behavior does not mean it produces a show that you want to recommend to an active user. (Hulu, Tech Blog)

Other key points of their approach lies in explanation-based diversity and temporal diversity. This situation demonstrates that existing problems of recommending information resources in different domains are always similar. Nevertheless, depending on the domain (user behavior, type of content, etc.) new metrics can emerge such as novelty. On the other hand, real time capabilities, off-line processing and performance are again key-enablers of a “good” recommendation engine apart from accuracy. Following some interesting lessons from Hulu’s experience are highlighted:

Explicit Feedback data is more important than implicit feedback data
Recent behaviors are much more important than old behaviors
Novelty, Diversity, and offline Accuracy are all important factors
Most researchers focus on improving offline accuracy, such as RMSE, precision/recall. However, recommendation systems that can accurately predict user behavior alone may not be a good enough for practical use. A good recommendation system should consider multiple factors together. In our system, after considering novelty and diversity, the CTR has improved by more than 10%. Please check this document out: “Automatic Generation of Recommendations from Data: A Multifaceted Survey” (a technical report from the School of Information Technology at Deakin | University Australia)

But, in which components or processes semantic technologies can help recommenders?

Taking into account the main drivers of the semantic web, the use of semantics can be part of some processes (Mining Data Semantics-MDS2012) such as:

Classification and prediction in heterogeneous networks
Pattern-analysis methods
Link mining and link prediction
Semantic search over heterogeneous networks
Mining with user interactions
Semantic mining with light-weight reasoning
Extending LOD and Quality of LOD disambiguation, identity, provenance, integration
Personalized mining in heterogeneous networks
Domain specific mining (e.g., Life Science and Health Care)
Collective intelligence mining
…

Finally, I will continue reviewing main recommendation services of large social networks (sort by name) such as Amazon, Facebook, Foursquare, Linkedin, Mendeley or Twitter to finally make a complete comparison according to different variables and features of the algorithms: feedback, real time, domain, user behavior, etc. After that my main objective will be make an implementation of a real use case in a distributed environment merging semantic technologies and recommendation algorithms to demonstrate if semantics can improve results (accuracy, etc.) of existing approaches.