Social Web and Recommendation
Publié le 14 février 2011
Damien Poirier defended his thesis in Orleans, France, on Friday, Feb. 12. Damien started this thesis with me in 2007 at Orange Labs in Lannion.
Unstructured data from the Social Web are particularly interesting in terms of recommendation. Indeed, through social medias, internet users share their opinions on products, movies, music, as reviews or comments (free text).
The central question of this thesis is: can we harness the opinions published on the Social Web to enrich a recommendation system of cultural goods?
The thesis concerns the transformation of unstructured text data into structured data usable by recommender systems. Based on textual reviews from community sites (blogs or forums), the objective is to build usage matrices as relevant inputs to recommender systems. The underlying idea is to enrich a new system in its ‘cold start period’, when it still has too few users on its own, and therefore little evidence of practice. The enrichment is realized with data from external users.
What I take from this work:
- In their ‘cold start’ state, recommendation systems generally use a content filtering approach. The recommendations corresponding to a matching between the descriptions of content and interest profiles of users. Damien’s approach – building usage data from external social data – proves to be more efficient. This result is interesting, especially to validate the original idea (mine ) that publications of opinions of Internet users could be useful for recommendation systems.
- It would appear from the experiments conducted, that the dataset shows an opening towards the analysis of bipartite graphs. Regardless notes that users attribute to the movies, simply annotate a movie is « informative » for the prediction of a rate. What about the analysis of a graph user-movie?
- Regarding the opinion mining, the thesis demonstrates that on the particular corpus used (free text comments in English), pre-linguistic processing contribute nothing to the quality of recommendations (RMSE measure used). A simple removal of the less frequently used words and turning letters to lowercase enough. And that’s good, it takes time.
- The classifier SVM (Support Vector Machine) is particularly suited to hollow data and described in great dimension. This is verified with opinion mining from textual comments from community sites (175,000 comments, the films are described by 60 000 terms after preprocessing).
Like other members of the jury, I strongly recommend reading his thesis, very didactic and very thorough. In particular, the extremely rigorous experimental work is a treat and can serve as an entry for a non-specialist reader interested in learning methodologies and techniques of opinion mining on the one hand, and understanding of systems of recommendation from other.