作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (8): 190-202. doi: 10.19678/j.issn.1000-3428.0069636

• 人工智能与模式识别 • 上一篇    下一篇

基于个性化数据增强的自监督序列推荐算法

王帅, 史艳翠*()   

  1. 天津科技大学人工智能学院, 天津 300457
  • 收稿日期:2024-03-21 修回日期:2024-04-23 出版日期:2025-08-15 发布日期:2025-08-28
  • 通讯作者: 史艳翠
  • 基金资助:
    国家自然科学基金(62377036)

Self-Supervised Sequence Recommendation Algorithm Based on Personalized Data Augmentation

WANG Shuai, SHI Yancui*()   

  1. College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China
  • Received:2024-03-21 Revised:2024-04-23 Online:2025-08-15 Published:2025-08-28
  • Contact: SHI Yancui

摘要:

序列推荐算法通过对用户的历史行为进行动态建模, 以预测其可能感兴趣的内容。聚焦对比式自监督学习(SSL)在序列推荐中的应用, 通过设计有效的自监督信号, 增强模型在稀疏数据场景下的表征能力。首先, 针对随机数据增强易引入数据噪声的问题, 提出融合用户偏好的个性化数据增强方法, 通过用户评分引导增强过程, 同时对长、短序列使用不同的增强方法组合, 生成符合用户偏好的增强序列; 其次, 为了缓解训练中出现的数据特征学习不平衡问题, 设计一种混合增强训练法, 在训练前期, 通过随机选择增强方法生成增强序列, 提高模型的性能和泛化能力, 在训练后期, 选择与原始序列相似度较高的增强序列, 使模型全面学习用户的实际偏好和行为模式; 最后, 将传统的序列预测目标与SSL目标相结合, 推断出用户的表示。在数据集Beauty、Toys和Sports上进行实验验证, 结果表明, 相较于基线模型中的最优结果, 所提方法的HR@5指标分别提升了6.61%、3.11%和3.76%, NDCG@5指标分别提升了11.40%、3.50%和2.16%, 上述实验结果验证了该方法的合理性和有效性。

关键词: 序列推荐, 自监督学习, 数据增强, 推荐系统, 数据特征

Abstract:

The sequence recommendation algorithm dynamically models the user's historical behavior to predict the content they may be interested in. This study focuses on the application of contrastive Self Supervised Learning (SSL) in sequence recommendation, enhancing the model's representation ability in sparse data scenarios by designing effective self supervised signals. First, a personalized data augmentation method incorporating user preferences is proposed to address the issue of noise introduced by random data augmentation. This method guides the augmentation process based on user ratings and combines different augmentation methods for short and long sequences to generate augmented sequences that align with user preferences. Second, a mixed-augmentation training approach is designed to address the issue of imbalanced feature learning during training. In the early stages of training, augmentation sequences are generated using randomly selected methods to enhance the model performance and generalization. In the later stages, augmentation sequences with high similarity to the original sequences are selected to enable the model to comprehensively learn the actual preferences and behavior patterns of users. Finally, traditional sequence prediction objectives are combined with SSL objectives to infer user representations. Experimental verification is performed using the Beauty, Toys, and Sports datasets. Compared with the best result in the baseline model, the HR@5 indicator of the proposed method increases by 6.61%, 3.11%, and 3.76%, and the NDCG@5 indicator increases by 11.40%, 3.50%, and 2.16%, respectively, for the aforementioned datasets. These experimental results confirm the rationality and validity of the proposed method.

Key words: sequence recommendation, Self-Supervised Learning (SSL), data augmentation, recommendation system, data features