作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (2): 46-51. doi: 10.19678/j.issn.1000-3428.0056332

• 人工智能与模式识别 • 上一篇    下一篇

Q-learning算法优化的SVDPP推荐算法

周运腾, 张雪英, 李凤莲, 刘书昌, 焦江丽, 田豆   

  1. 太原理工大学 信息与计算机学院, 太原 030600
  • 收稿日期:2019-10-18 修回日期:2020-01-20 出版日期:2021-02-15 发布日期:2020-02-12
  • 作者简介:周运腾(1995-),男,硕士研究生,主研方向为推荐系统、强化学习;张雪英(通信作者)、李凤莲,教授、博士研究生;刘书昌,硕士研究生;焦江丽,博士;田豆,硕士研究生。
  • 基金资助:
    山西省重点研发计划(社会发展领域)(201803D31045);山西省自然科学基金(201801D121138);山西省科技重大专项(20181102008)。

SVDPP Recommendation Algorithm Optimized by Q-learning Algorithm

ZHOU Yunteng, ZHANG Xueying, LI Fenglian, LIU Shuchang, JIAO Jiangli, TIAN Dou   

  1. School of Information and Computer, Taiyuan University of Technology, Taiyuan 030600, China
  • Received:2019-10-18 Revised:2020-01-20 Online:2021-02-15 Published:2020-02-12

摘要: 为进一步改善个性化推荐系统的推荐效果,通过使用强化学习方法对SVDPP算法进行优化,提出一种新的协同过滤推荐算法。考虑用户评分的时间效应,将推荐问题转化为马尔科夫决策过程。在此基础上,利用Q-learning算法构建融合时间戳信息的用户评分优化模型,同时通过预测评分取整填充和优化边界补全方法预测缺失值,以解决数据稀疏性问题。实验结果显示,该算法的均方根误差较SVDPP算法降低了0.005 6,表明融合时间戳并采用强化学习方法进行推荐性能优化是可行的。

关键词: 协同过滤, 奇异值分解, 强化学习, 马尔科夫决策过程, Q-learning算法

Abstract: To futher improve the recommendation performance of personalized recommendation systems,this paper proposes a Collaborative Filtering(CF) recommendation algorithm based on SVDPP algorithm optimized by reinforcement learning.Considering the time effect of user ratings,the recommendation problem is transformed into a Markov Decision Process(MDP).On this basis,the Q-learning algorithm is used to construct a user rating optimization model fused with timestamp information.At the same time,in order to solve the data sparse problem,the prediction score is rounded to the nearest integer to fill and optimize the boundary to make up for the missing value in the process of prediction.Experimental results show that the RMSE of this algorithm is 0.005 6 lower than that of SVDPP algorithm,which demonstrates that it is feasible to use the reinforcement learning method and timestamp to optimize the recommendation performance.

Key words: Collaborative Filtering(CF), Singular Value Decomposition(SVD), reinforcement learning, Markov Decision Process(MDP), Q-learning algorithm

中图分类号: