作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 91-100. doi: 10.19678/j.issn.1000-3428.0066929

• 人工智能与模式识别 • 上一篇    下一篇

基于多模态融合的图神经网络推荐算法

吴志强1,3, 解庆1,2,3,*(), 李琳1,2, 刘永坚1,2,3   

  1. 1. 武汉理工大学计算机与人工智能学院, 湖北 武汉 430070
    2. 数字出版智能服务技术教育部工程研究中心, 湖北 武汉 430070
    3. 武汉理工大学重庆研究院, 重庆 401135
  • 收稿日期:2023-02-13 出版日期:2024-01-15 发布日期:2024-01-15
  • 通讯作者: 解庆
  • 基金资助:
    国家自然科学基金(62276196); 重庆市自然科学基金(cstc2021jcyj-msxmX1013); 湖北省重点研发计划项目(2021BAA030)

Graph Neural Network Recommendation Algorithm Based on Multimodal Fusion

Zhiqiang WU1,3, Qing XIE1,2,3,*(), Lin LI1,2, Yongjian LIU1,2,3   

  1. 1. School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, Hubei, China
    2. Engineering Research Center of Intelligent Service Technology for Digital Publishing, Ministry of Education, Wuhan 430070, Hubei, China
    3. Chongqing Research Institute of Wuhan University of Technology, Chongqing 401135, China
  • Received:2023-02-13 Online:2024-01-15 Published:2024-01-15
  • Contact: Qing XIE

摘要:

已有的图神经网络(GNN)推荐算法大多利用用户-项目交互图的节点编号信息进行训练,学习用户-项目节点的高阶联系去丰富节点表示,但忽略了用户对不同模态信息的偏好,没有利用项目的图片、文本等模态信息,或对于不同模态特征的融合简单相加,不能区分用户对不同模态信息的偏好。针对上述问题,提出多模态融合的GNN推荐模型。首先针对单个模态,结合用户-项目交互二部图构建单模态图网络,在单模态图中学习用户对此模态信息的偏好;然后利用GAT聚合邻居信息,丰富本节点表示,同时根据门控循环单元决定是否聚合邻居信息,达到去噪效果;最后将各个模态图学习到的用户、项目表示通过注意力机制融合得到最终表示并送入预测模块。在MovieLens-20M、H&M两个数据集上的实验结果表明:多模态信息、注意力融合机制能有效提升推荐的准确度,算法模型在Precision@K、Recall@K和NDCG@K 3个指标上相较于基线最优算法均有显著提升;当评估指标K值选取10时,Precision@10、Recall@10和NDCG@10在两个数据集上分别提升了4.67%、2.42%、2.03%和2.49%、5.24%、2.05%。

关键词: 多模态推荐, 多模态融合, 注意力机制, 图神经网络, 推荐系统, 门控图神经网络

Abstract:

Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

Key words: multimodal recommendation, multimodal fusion, attention mechanism, Graph Neural Network(GNN), recommendation system, gated Graph Neural Network(GNN)