作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (5): 279-290. doi: 10.19678/j.issn.1000-3428.0066384

• 开发研究与工程应用 • 上一篇    下一篇

基于因果机制约束的强化推荐系统

张斯力1, 李梓健1, 蔡瑞初1, 郝志峰2, 闫玉光1   

  1. 1. 广东工业大学计算机学院, 广东 广州 510006;
    2. 汕头大学工学院, 广东 汕头 515063
  • 收稿日期:2022-11-28 修回日期:2023-03-13 发布日期:2023-04-04
  • 通讯作者: 张斯力,E-mail:827622899@qq.com E-mail:827622899@qq.com
  • 基金资助:
    国家自然科学基金(61876043,61976052,62206061);国家优秀青年科学基金(62122022);科技创新2030—"新一代人工智能"重大项目(2021ZD0111501)。

Reinforcement Recommendation System Based on Causal Mechanism Constraint

ZHANG Sili1, LI Zijian1, CAI Ruichu1, HAO Zhifeng2, YAN Yuguang1   

  1. 1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, Guangdong, China;
    2. College of Engineering, Shantou University, Shantou 515063, Guangdong, China
  • Received:2022-11-28 Revised:2023-03-13 Published:2023-04-04
  • Contact: 张斯力,E-mail:827622899@qq.com E-mail:827622899@qq.com

摘要: 利用历史数据训练强化学习推荐系统已经得到越来越多研究人员的关注,但是历史数据使得强化学习模型对状态-动作估值错误,产生数据偏差,如流行度偏差和选择偏差。造成上述问题的原因是历史数据分布与强化学习策略采集的数据分布不一致以及历史数据本身带有偏差。使用因果机制可以在约束策略采集数据分布的同时解决数据偏差的问题,提出基于因果机制约束的强化推荐系统,包含因果机制约束模块和对比策略模块。因果机制约束模块用于约束推荐策略可选择的样本空间以减少策略分布与数据分布误差,考虑随时间动态变化的物品流行度分布以缓解流行度偏差。对比策略模块通过平衡正负样本的重要性,缓解选择偏差的影响。在真实数据集Ciao和Epinions上的实验结果表明,相比深度Q网络(DQN)-r、GAIL、SOFA等,该算法具有较优的准确性和多样性,包含加入因果机制约束模块后的模型在F-measure指标上分别提高2%和3%,进一步验证了因果机制约束模块的有效性。

关键词: 推荐系统, 强化学习, 因果机制, 外推误差, 数据偏差

Abstract: The application of historical data for training reinforcement learning recommendation systems is currently gaining attention from researchers. However,historical data leads to the incorrect estimation of state-actions in reinforcement learning models,resulting in data biases such as popularity and selection biases. The reason for this is that the distribution of historical data is inconsistent with the data collected by reinforcement learning strategies,and the historical data itself exhibits bias. To address this challenge,the use of causal mechanisms has proven effective in resolving data bias while constraining the distribution of data collected through policies. This paper proposes a reinforcement recommendation system based on causal mechanism constraint,comprising a causal mechanism constraint module and a comparison strategy module. The causal mechanism constraint module serves to limit the sample space that recommendation strategies can choose,thereby reducing errors in policy and data distributions. Notably,the causal mechanism constraint module considers the dynamic changes in the distribution of item popularity over time to alleviate popularity bias. Simultaneously,the comparison strategy module mitigates the impact of selection bias by balancing the importance of positive and negative samples. Experimental results on real datasets Ciao and Epinions show that,in comparison to Deep Q Network(DQN)-r,GAIL,SOFA,etc.,this algorithm exhibits superior accuracy and diversity. Moreover,the model with the causal constraint module improves the F-measure index by 2% and 3%,respectively,compared to the model without the causal constraint module,further verifying the effectiveness of the causal constraint module.

Key words: recommendation system, reinforcement learning, causal mechanism, extrapolation error, data bias

中图分类号: