|
An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms |
基于邻居的协同过滤的设计选择的一个实验分析 |
Abstract. Collaborative filtering systems predict a user’s interest in new items based on the recommendations of other people with similar interests. Instead of performing content indexing or content analysis, collaborative filtering systems rely entirely on interest ratings from members of a participating community. Since predictions are based on human ratings, collaborative filtering systems have the potential to provide filtering based on complex attributes, such as quality, taste, or aesthetics. Many implementations of collaborative filtering apply some variation of the neighborhood-based prediction algorithm. Many variations of similarity metrics, weighting approaches, combination measures, and rating normalization have appeared in each implementation. For these parameters and others, there is no consensus as to which choice of technique is most appropriate for what situations, nor how significant an effect on accuracy each parameter has. Consequently, every person implementing a collaborative filtering system must make hard design choices with little guidance. This article provides a set of recommendations to guide design of neighborhood-based prediction systems, based on the results of an empirical study. We apply an analysis framework that divides the neighborhood-based prediction approach into three components and then examines variants of the key parameters in each component. The three components identified are similarity computation, neighbor selection, and rating combination. |
协同过滤系统预测某个用户对新的商品的兴趣是基于其他和他相似的用户的兴趣。不同于通过内容索引和分析,协同过滤系统完全依赖参与的社区的兴趣打分。由于预测是基于人的打分,协同过滤系统的潜能就是提供基于复杂属性的过滤,例如质量、爱好,或者审美。很多相似性的矩阵,权重的方法,综合的方法,以及评估归一化的方法都在每个应用中出现。对这些参数以及其他的,对于哪种技术的选择最适合那种情况还没有共识,对于每个参数到底有多大的影响也没有共识。结果,每个人在一个协同过滤系统必须有很硬的选择而只有很少的引导。本文提供了一个推荐集合来引导基于邻居关系的系统的设计,基于一个经验研究的结果。我们使用了一个分析框架景基于邻居关系的方法分为三个部分然后检验每个部分的关键参数及其变种。这三个被检验的部分是相似性计算、邻居的选择以及评价(Rating)的组合。
|
本文研究的是基于协同过滤算法的推荐系统经验分析。显然,本文针对的问题是目前关于在什么条件下使用什么样的参数,这样的参数到底起到多大的作用还没有一个明确的共识。然后本文想通过实验来进行研究。然后作者针对推荐系统的三个重要的模块(components)进行了检验和分析。这三个模块就是相似性计算,邻居的选择以及评价(Rating)组合。这篇文章应该属于综述加实证类的文章,引用500多次,证明这里的结论非常值得参考。
|
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-5-21 05:18
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社