TY - JOUR
T1 - A maximum entropy approach for collaborative filtering
AU - Browning, John
AU - Miller, David J.
N1 - Funding Information:
This work was supported by National Science Foundation grants IIS-9624870 and IIS-0082214.
PY - 2004/6
Y1 - 2004/6
N2 - Collaborative filtering (CF) involves predicting the preferences of a user for a set of items given partial knowledge of the user's preferences for other items, while leveraging a database of profiles for other users. CF has applications e.g. in predicting Web sites a person will visit and in recommending products. Fundamentally, CF is a pattern recognition task, but a formidable one, often involving a huge feature space, a large data set, and many missing features. Even more daunting is the fact that a CF inference engine must be capable of predicting any (user-selected) items, given any available set of partial knowledge on the user's other preferences. In other words, the model must be designed to solve any of a huge (combinatoric) set of possible inference tasks. CF techniques include memory-based, classification-based, and statistical modelling approaches. Among these, modelling approaches scale best with large data sets and are the most adept at handling missing features. The disadvantage of these methods lies in the statistical assumptions (e.g. feature independence), which may be unjustified. To address this shortcoming we propose a new model-based CF method, based on the maximum entropy principle. For the MS Web application, the new method is demonstrated to outperform a number of CF approaches, including naive Bayes and latent variable (cluster) models, support vector machines (SVMs), and the (Pearson) correlation method.
AB - Collaborative filtering (CF) involves predicting the preferences of a user for a set of items given partial knowledge of the user's preferences for other items, while leveraging a database of profiles for other users. CF has applications e.g. in predicting Web sites a person will visit and in recommending products. Fundamentally, CF is a pattern recognition task, but a formidable one, often involving a huge feature space, a large data set, and many missing features. Even more daunting is the fact that a CF inference engine must be capable of predicting any (user-selected) items, given any available set of partial knowledge on the user's other preferences. In other words, the model must be designed to solve any of a huge (combinatoric) set of possible inference tasks. CF techniques include memory-based, classification-based, and statistical modelling approaches. Among these, modelling approaches scale best with large data sets and are the most adept at handling missing features. The disadvantage of these methods lies in the statistical assumptions (e.g. feature independence), which may be unjustified. To address this shortcoming we propose a new model-based CF method, based on the maximum entropy principle. For the MS Web application, the new method is demonstrated to outperform a number of CF approaches, including naive Bayes and latent variable (cluster) models, support vector machines (SVMs), and the (Pearson) correlation method.
UR - http://www.scopus.com/inward/record.url?scp=3543101946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=3543101946&partnerID=8YFLogxK
U2 - 10.1023/B:VLSI.0000027485.11890.15
DO - 10.1023/B:VLSI.0000027485.11890.15
M3 - Article
AN - SCOPUS:3543101946
SN - 1387-5485
VL - 37
SP - 199
EP - 209
JO - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
JF - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
IS - 2-3
ER -