Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Semantic Enhanced Multimodal Hypergraph Recommendation Model

  

  • Published:2025-12-24

基于语义增强的多模态超图推荐模型

Abstract: Multimodal recommendation aims to enhance item representation by introducing multimodal content features such as visual and textual information, to effectively alleviate data sparsity and cold start problems while more accurately capturing user interests. However, existing methods mostly rely on hypergraph propagation mechanisms based on ID embedding, often failing to adequately exploit the rich semantic information in multimodal features. To address the above issues, this paper proposes a Semantic Enhanced Multimodal Hypergraph Recommendation Model. First, the model constructs a user-item interaction view and an item-item semantic view, and utilizes Graph Convolutional Networks to extract high-order collaborative signals from behavioral data and to uncover deep semantic relationships between items based on multimodal content, respectively. Secondly, the model designs a modality-aware fusion module to dynamically aggregate the multimodal representations of users and items, balancing the contributions of different modalities. Subsequently, a user-user and item-item hypergraphs is constructed to explicitly model the group interest preferences of user and the high-order semantic relationships between items. Finally, to enhance the mutual information between multimodal and behavioral features, the model introduces a collaborative contrastive learning mechanism and designs two auxiliary contrastive tasks: modality alignment loss aims to ensure the consistency between ID embeddings and multimodal semantics, and neighbor aggregation loss enhances the local robustness of the interaction structure, thus achieving global semantic alignment and local structure preservation in a collaborative manner. Experiments result on three real-world datasets, namely Tiktok, Sports, and Clothing, show that the proposed model improves the Recall@20 metric by 1.32%, 5.99%, and 6.58%, respectively, compared to the best baseline models, and the NDCG@20 metric by 5.69%, 2.00%, and 7.61%, respectively.

摘要: 多模态推荐旨在通过引入视觉、文本等多模态内容特征以增强项目表示,能够有效缓解数据稀疏与冷启动问题,并更精准地捕捉用户兴趣偏好。然而,现有方法大多依赖于基于ID嵌入的超图传播机制,未能充分挖掘多模态特征中丰富的语义信息。针对上述问题,提出一种基于语义增强的多模态超图推荐模型。首先,通过构建用户-项目交互视图与项目-项目语义视图,利用图卷积网络分别从行为数据中提取高阶协同信号,以及基于多模态内容挖掘项目间深层语义关系。其次,设计模态感知融合模块动态聚合用户与项目的多模态表示,实现不同模态贡献的平衡。进而构建用户-用户与项目-项目超图,显式建模用户群体兴趣偏好与项目间的高阶语义关系。最后,为增强多模态特征与行为特征间的互信息,引入协同对比学习机制,并设计两类辅助对比任务:模态对齐损失旨在确保ID嵌入与多模态语义的一致性;邻域聚合损失增强交互结构的局部鲁棒性,从而协同实现全局语义对齐与局部结构保持。在Tiktok、Sports和Clothing三个真实数据集上的实验结果表明,所提出的模型在Recall@20指标上相比较基线最优模型分别提升1.32%、5.99%与6.58%,在NDCG@20指标上分别提升5.69%、2.00%与7.61%。