Following with the review of existing recommending systems in multimedia sites I have found through Marcos Merino the recomendation engine provide by HULU (it is an online video service that offers a selection of hit shows, clips, movies and more).
It brings together a large selection of videos from over 350 content companies, including FOX, NBCUniversal, ABC, The CW, Univision, Criterion, A&E Networks, Lionsgate, Endemol, MGM, MTV Networks, Comedy Central, National Geographic, Digital Rights Group, Paramount, Sony Pictures, Warner Bros., TED and more. (Hulu, About)
But, which is the underlying technology in Hulu?
Checking the technological blog they have spent a lot of effort to provide a great recommending engine in which they have decided to recommend shows to users instead of individual videos. Thus, contents can be organized due to same shows videos are usually closely related. As well as Netflix one of the drivers of the recommendation is the user behavior data (implicit and explicit feedback). The algorithm implemented in Hulu is based on a collaborative filtering approach (user or item based) but the most important part lies in Hulu’s architecture which is comprised of the next components:
- User profile builder
- Recommendation core
- Filtering
- Ranking
- Explanation
Besides they have an off-line system for data processing that supports aforementioned processes and it is based on a data center, a topic model, a related table generator, a feedback analyzer and a report generator. According to these components and processes they have been applied an item-based collaborative filtering algorithm to make recommendations. One of the keypoints to evaluate recommendations is “Novelty”:
Just because a recommendation system can accurately predict user behavior does not mean it produces a show that you want to recommend to an active user. (
Hulu, Tech Blog)
Other key points of their approach lies in explanation-based diversity and temporal diversity. This situation demonstrates that existing problems of recommending information resources in different domains are always similar. Nevertheless, depending on the domain (user behavior, type of content, etc.) new metrics can emerge such as novelty. On the other hand, real time capabilities, off-line processing and performance are again key-enablers of a “good” recommendation engine apart from accuracy. Following some interesting lessons from Hulu’s experience are highlighted:
- Explicit Feedback data is more important than implicit feedback data
- Recent behaviors are much more important than old behaviors
- Novelty, Diversity, and offline Accuracy are all important factors
- Most researchers focus on improving offline accuracy, such as RMSE, precision/recall. However, recommendation systems that can accurately predict user behavior alone may not be a good enough for practical use. A good recommendation system should consider multiple factors together. In our system, after considering novelty and diversity, the CTR has improved by more than 10%. Please check this document out: “Automatic Generation of Recommendations from Data: A Multifaceted Survey” (a technical report from the School of Information Technology at Deakin | University Australia)
But, in which components or processes semantic technologies can help recommenders?
Taking into account the main drivers of the semantic web, the use of semantics can be part of some processes (
Mining Data Semantics-MDS2012) such as:
- Classification and prediction in heterogeneous networks
- Pattern-analysis methods
- Link mining and link prediction
- Semantic search over heterogeneous networks
- Mining with user interactions
- Semantic mining with light-weight reasoning
- Extending LOD and Quality of LOD disambiguation, identity, provenance, integration
- Personalized mining in heterogeneous networks
- Domain specific mining (e.g., Life Science and Health Care)
- Collective intelligence mining
- …
Finally, I will continue reviewing main recommendation services of large social networks (sort by name) such as Amazon, Facebook, Foursquare, Linkedin, Mendeley or Twitter to finally make a complete comparison according to different variables and features of the algorithms: feedback, real time, domain, user behavior, etc. After that my main objective will be make an implementation of a real use case in a distributed environment merging semantic technologies and recommendation algorithms to demonstrate if semantics can improve results (accuracy, etc.) of existing approaches.