基于多模态融合的图神经网络推荐算法

doi:10.19678/j.issn.1000-3428.0066929

摘要/Abstract

摘要：

已有的图神经网络(GNN)推荐算法大多利用用户-项目交互图的节点编号信息进行训练，学习用户-项目节点的高阶联系去丰富节点表示，但忽略了用户对不同模态信息的偏好，没有利用项目的图片、文本等模态信息，或对于不同模态特征的融合简单相加，不能区分用户对不同模态信息的偏好。针对上述问题，提出多模态融合的GNN推荐模型。首先针对单个模态，结合用户-项目交互二部图构建单模态图网络，在单模态图中学习用户对此模态信息的偏好；然后利用GAT聚合邻居信息，丰富本节点表示，同时根据门控循环单元决定是否聚合邻居信息，达到去噪效果；最后将各个模态图学习到的用户、项目表示通过注意力机制融合得到最终表示并送入预测模块。在MovieLens-20M、H&M两个数据集上的实验结果表明：多模态信息、注意力融合机制能有效提升推荐的准确度，算法模型在Precision@K、Recall@K和NDCG@K 3个指标上相较于基线最优算法均有显著提升；当评估指标K值选取10时，Precision@10、Recall@10和NDCG@10在两个数据集上分别提升了4.67%、2.42%、2.03%和2.49%、5.24%、2.05%。

Abstract:

Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

Key words: multimodal recommendation, multimodal fusion, attention mechanism, Graph Neural Network(GNN), recommendation system, gated Graph Neural Network(GNN)

吴志强, 解庆, 李琳, 刘永坚. 基于多模态融合的图神经网络推荐算法[J]. 计算机工程, 2024, 50(1): 91-100.

Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU. Graph Neural Network Recommendation Algorithm Based on Multimodal Fusion[J]. Computer Engineering, 2024, 50(1): 91-100.

http://www.ecice06.com/CN/Y2024/V50/I1/91

图/表 9

图1 多模态项目示例

Fig.1 Example of multimodal item

图2 MFGAT的总体框架

Fig.2 The overall framework of MFGAT

图3 不同层数对MFGAT性能的影响

Fig.3 The influence of different layers on MFGAT performance

图4 不同数据集上多模态类型的实验结果

Fig.4 Experimental results of multimodal type on different datasets

图5 模态融合中不同融合方法的实验结果

Fig.5 Experimental results of different fusion methods in modal fusion

图6 注意力权重分布散点图

Fig.6 Scatter diagram of attention weight distribution

参考文献 28

1	KOREN Y, BELL R, VOLINSKY C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30- 37.
2	KABBUR S, NING X, KARYPIS G. FISM: factored item similarity models for top-N recommender systems[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2013: 659-667.
3	HE X N, HE Z K, SONG J K, et al. NAIS: neural attentive item similarity model for recommendation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(12): 2354- 2366.
4	WANG X, HE X N, WANG M, et al. Neural graph collaborative filtering[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 2019: 165-174.
5	HE X N, DENG K, WANG X, et al. LightGCN: simplifying and powering graph convolution network for recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 2020: 639-648.
6	LIU M, NIE L Q, WANG X A, et al. Online data organizer: micro-video categorization by structure-guided multimodal dictionary learning. IEEE Transactions on Image Processing, 2019, 28(3): 1235- 1247.
7	RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[EB/OL]. [2023-01-10]. https://arxiv.org/abs/1205.2618.pdf.
8	HE R N, MCAULEY J. VBPR: visual Bayesian personalized ranking from implicit feedback. Artificial Intelligence, 2016, 30(1): 1- 12.
9	CHEN J Y, ZHANG H W, HE X N, et al. Attentive collaborative filtering: multimedia recommendation with item- and component-level attention[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 2017: 335-344.
10	SUN R, CAO X Z, ZHAO Y, et al. Multi-modal knowledge graphs for recommender systems[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York, USA: ACM Press, 2020: 1405-1414.
11	WANG X, HE X N, CAO Y X, et al. KGAT: knowledge graph attention network for recommendation[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, USA: ACM Press, 2019: 950-958.
12	WU S W, SUN F, ZHANG W T, et al. Graph neural networks in recommender systems: a survey. ACM Computing Surveys, 2022, 55(5): 1- 37.
13	TAO Z L, WEI Y W, WANG X, et al. MGAT: multimodal graph attention network for recommendation. Information Processing & Management, 2020, 57(5): 102277.
14	DONG J F, LI X R, SNOEK C G M. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia, 2018, 20(12): 3377- 3388.
15	LIU Y, ALBANIE S, NAGRANI A, et al. Use what you have: video retrieval using representations from collaborative experts[EB/OL]. [2023-01-10]. https://arxiv.org/abs/1907.13487.pdf.
16	GABEUR V, SUN C, ALAHARI K, et al. Multi-modal transformer for video retrieval. Berlin, Germany: Springer, 2020: 214- 229.
17	LI X R, XU C X, YANG G, et al. W₂VV++: fully deep learning for ad-hoc video search[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM Press, 2019: 1786-1794.
18	LI X R, ZHOU F M, XU C X, et al. SEA: sentence encoder assembly for video retrieval by textual queries. IEEE Transactions on Multimedia, 2021, 23, 4351- 4362.
19	HU F, CHEN A Z, WANG Z Y, et al. Lightweight attentional feature fusion: a new baseline for text-to-video retrieval[EB/OL]. [2023-01-10]. https://arxiv.org/abs/2112.01832.pdf.
20	CAO Y X, WANG X, HE X N, et al. Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences[C]//Proceedings of World Wide Web Conference. New York, USA: ACM Press, 2019: 151-161.
21	CHEN T, HE X N, KAN M Y. Context-aware image tweet modelling and recommendation[C]//Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM Press, 2016: 1018-1027.
22	GAO J Y, ZHANG T Z, XU C S. A unified personalized video recommendation via dynamic recurrent neural networks[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York, USA: ACM Press, 2017: 127-135.
23	潘华莉, 谢珺, 高婧, 等. 融合多模态特征的深度强化学习推荐模型. 数据分析与知识发现, 2023, 7(4): 114- 128.
	PAN H L, XIE J, GAO J, et al. A deep reinforcement learning recommendation model with multi-modal features. Data Analysis and Knowledge Discovery, 2023, 7(4): 114- 128.
24	WEI Y W, WANG X, NIE L Q, et al. MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM Press, 2019: 1437-1445.
25	HAMILTON W L, YING R, LESKOVEC J. Inductive representation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 1025-1035.
26	胡承佐, 王庆梅, 李迪超, 等. 基于复杂结构信息的图神经网络序列推荐算法. 计算机工程, 2022, 48(5): 82-90, 97. URL
	HU C Z, WANG Q M, LI D C, et al. Sequence recommendation algorithm of graph neural networks based on complex structure information. Computer Engineering, 2022, 48(5): 82-90, 97. URL
27	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-01-10]. https://arxiv.org/abs/1810.04805.pdf.
28	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.

[1]	申秀雨, 姬伟峰, 李映岐, 吴玄. 面向边缘计算的TCA1C DDoS检测模型[J]. 计算机工程, 2024, 50(1): 198-205.
[2]	顾嘉静, 杨丹, 聂铁铮, 寇月. 基于多视图融合跨层对比学习的推荐算法[J]. 计算机工程, 2024, 50(1): 120-128.
[3]	周昊玮, 刘勇, 玄萍. 基于预训练和多模态融合的假新闻检测[J]. 计算机工程, 2024, 50(1): 289-295.
[4]	圣文顺, 余熊峰, 林佳燕, 陈欣. 融合注意力与特征金字塔的小尺度目标检测算法[J]. 计算机工程, 2024, 50(1): 242-250.
[5]	杨瑞君, 秦晋京, 程燕. 基于生成对抗网络的自然场景低照度增强模型[J]. 计算机工程, 2024, 50(1): 279-288.
[6]	徐晓峰, 黄韫栀, 徐军. 基于各向异性注意力的双分支血管分割模型[J]. 计算机工程, 2024, 50(1): 348-356.
[7]	刘昀抒, 申彦明, 齐恒, 尹宝才. 基于层次结构图的多跳知识图谱问答模型[J]. 计算机工程, 2024, 50(1): 101-109.
[8]	隋国华, 李陶然, 刘昊, 陈林, 汪卫. 基于图表示学习的领域知识图谱推理技术研究[J]. 计算机工程, 2023, 49(9): 89-98.
[9]	苏晓东, 李世洲, 赵佳圆, 亮洪宇, 张玉荣, 徐红岩. 基于多级叠加和注意力机制的图像语义分割[J]. 计算机工程, 2023, 49(9): 265-271, 278.
[10]	杨静, 陆铭华, 马洁琼, 吴金平, 刘星璇. 基于交替循环神经网络的水下防御态势预测方法[J]. 计算机工程, 2023, 49(9): 69-78.
[11]	孙龙, 张荣芬, 刘宇红, 饶庭漓. 监控视角下密集人群口罩佩戴检测算法[J]. 计算机工程, 2023, 49(9): 313-320.
[12]	刘晓黎, 王轶彤. 基于自监督学习的多密度图会话推荐[J]. 计算机工程, 2023, 49(9): 60-68, 78.
[13]	韩璐, 霍纬纲, 张永会, 刘涛. 基于多尺度特征融合与双注意力机制的多元时间序列预测[J]. 计算机工程, 2023, 49(9): 99-108.
[14]	龙玉江, 卫薇, 舒彧, 张正刚, 王道累, 李峰. 基于自适应关键点的破损旋转绝缘子检测方法[J]. 计算机工程, 2023, 49(9): 272-278.
[15]	刘昊鑫, 董超, 勾智楠, 高凯. 融合混合表征的小样本关系抽取方法[J]. 计算机工程, 2023, 49(8): 63-68.

选择文件类型/文献管理软件名称

选择包含的内容