作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (12): 140-150. doi: 10.19678/j.issn.1000-3428.0069802

• 人工智能与模式识别 • 上一篇    下一篇

基于图像置信度动态引导的多模态实体对齐

张晓明, 陈通庆, 王会勇*()   

  1. 河北科技大学信息科学与工程学院, 河北 石家庄 050018
  • 收稿日期:2024-04-28 修回日期:2024-07-07 出版日期:2025-12-15 发布日期:2024-08-20
  • 通讯作者: 王会勇
  • 基金资助:
    河北省自然科学基金(F2022208002); 石家庄市基础研究计划项目(241790867A)

Dynamic Guided Multimodal Entity Alignment Based on Image Confidence

ZHANG Xiaoming, CHEN Tongqing, WANG Huiyong*()   

  1. School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018, Hebei, China
  • Received:2024-04-28 Revised:2024-07-07 Online:2025-12-15 Published:2024-08-20
  • Contact: WANG Huiyong

摘要:

多模态实体对齐可以找到不同知识图谱中指向真实世界同一对象的实体进一步融合知识图谱。然而, 图像在多模态知识图谱中作为关键的信息载体, 其内在的噪声常常遭到忽略, 这不仅降低了实体对齐的准确性, 也影响了不同知识图谱融合的质量。因此, 提出一种基于图像置信度动态引导的实体对齐模型。该模型首先计算实体所对应的每个图像符合预设类型的置信度; 然后根据置信度动态挑选出类型一致且置信度最高的图像特征, 并利用这些特征进行相似度计算, 从而得到图像置信度引导的实体对齐相似度矩阵; 最后使用晚期融合策略将其与文本引导的实体对齐相似度矩阵相结合, 使之能够有效地处理多模态实体对齐任务。在两个常用的多模态数据集上的实验结果表明, 该模型在性能上超越了现有的多种基线模型, 能够较好地实现多模态实体对齐。

关键词: 知识图谱, 多模态实体对齐, 知识融合, 多模态数据, 知识表示学习

Abstract:

Multimodal entity alignment identifies entities across different knowledge graphs that refer to the same real-world object and integrates these knowledge graphs. However, the inherent noise present in images, which are critical carriers of information in multimodal knowledge graphs, is often overlooked. This oversight not only reduces the accuracy of entity alignment but also affects the quality of knowledge graph fusion. Therefore, this paper proposes a dynamic guided multimodal entity alignment model based on image confidence. This model first calculates the confidence that each image corresponding to the entity conforms to the preset type, and then dynamically selects the image features with the highest confidence and having the same type of image confidence. It uses these features for similarity calculation, thus obtaining an image confidence-guided entity alignment similarity matrix. A late-fusion strategy is used to combine it with a text-guided entity alignment similarity matrix, enabling it to effectively handle multimodal entity alignment tasks. Experimental results on two widely used multimodal datasets demonstrate that this model outperforms various existing baseline models and effectively achieves multimodal entity alignment.

Key words: knowledge graph, multimodal entity alignment, knowledge fusion, multimodal data, knowledge representation learning