作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于MapReduce与项目分类的协同过滤算法

程曦1,陈军1,2   

  1. (1.武汉大学 国家多媒体软件工程技术研究中心,武汉 430072;2.武汉大学 深圳研究院,广东 深圳 518063)
  • 收稿日期:2015-07-13 出版日期:2016-07-15 发布日期:2016-07-15
  • 作者简介:程曦(1988-),男,硕士研究生,主研方向为大数据处理、云计算;陈军,教授、博士生导师。
  • 基金资助:
    国家自然科学基金资助项目(61170023)。

Collaborative Filtering Algorithm Based on MapReduce and Item Classification

CHENG Xi  1,CHEN Jun  1,2   

  1. (1.National Engineering Research Center for Multimedia Software,Wuhan University,Wuhan 430072,China; 2.Shenzhen Research Institute,Wuhan University,Shenzhen,Guangdong 518063,China)
  • Received:2015-07-13 Online:2016-07-15 Published:2016-07-15

摘要: 针对传统协同过滤算法中存在的数据稀疏性和系统可扩展性问题,提出一种新的协同过滤算法。根据用户对不同项目的评价信息得出项目评分矩阵,利用朴素贝叶斯分类器对项目进行分类,通过修正的余弦相似度计算方法在相同类中寻找项目最近邻集合。结合Hadoop平台下的MapReduce并行计算框架进行数据分布式处理,最终形成评分预测列表进行项目推荐。实验结果表明,与基于用户分类的协同过滤算法和基于项目分类的协同过滤算法相比,该算法能有效解决因数据稀疏导致预测精度较低的问题,具有较高的推荐准确性,并且通过算法并行计算提高了系统运行效率和可扩展性。

关键词: 协同过滤, 项目分类, 相似度计算, 并行计算, 分布式处理, 评分预测

Abstract: Aiming at the problem of data sparseness and system scalability in traditional collaborative filtering algorithms,this paper proposed a new collaborative filtering algorithm.This algorithm firstly obtains the item rating matrix according to users’ rating on different items.Secondly,it utilizes naive Bayesian classifier to classify the items,and then searches for the items’ nearest-neighbor sets in the same class by the modified cosine similarity computation method.At the same time,it uses the MapReduce parallel computation framework on Hadoop to implement distributed data processing.Finally,it forms a rating prediction list and makes recommendations.Experimental results show that the algorithm not only effectively solves the problem of low prediction accuracy due to data sparseness,but also improves the accuracy of recommendation compared with collaborative filtering algorithm based on user classification and that based on item classification.It greatly improves the efficiency and scalability of the system by parallel computation.

Key words: collaborative filtering, item classification, similarity computation, parallel computation, distributed processing, rating prediction

中图分类号: