Bruno Pedro


Collaborative filtering

Are you tired of your feed reader? Do you wish you could find more interesting posts, or perhaps new blogs related to your current tastes and preferences?

Apparently Dave Winer feels the same way:

I want rating services to provide clues about what I should be subscribing to. I want them to find not what’s popular with the masses but what will be valuable to me.

He then touches the sweet spot:

It’s a simple matter to apply collaborative filtering to this problem, we’ve even done it in SYO. These ideas need revisiting now that everyone else seems to have caught on that this is a problem worth solving.

Paolo Avesani, who’s already been studying this subject for some time, understands that tags alone are not enough to propose recommendations. Quoting the paper “An Analysis of the Use of Tags in a Blog Recommender System” [Hayes et al., 2007] (PDF):

In the blog domain, however, we find that tags are rather poor at partitioning blog data. Using content-based clustering, we observe that a small proportion of users in every cluster have independently used the same tag tokens to describe his/her posts.

We definitely need something new. What about using collaborative filtering algorithms to gain knowledge about the users’ tastes and eventually recommend them interesting content? The Pearson correlation algorithm is probably a good candidate.

$$w_{a,u}=\frac{\sum_{i=1}^m(r_{a,i}-\bar{r_a})\times(r_{u,i}-\bar{r_u})}{{\sigma}_a\times{\sigma}_u}$$

I suggest watching Tayfun Şen‘s excellent presentation about collaborative filtering.