作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (8): 70-76. doi: 10.19678/j.issn.1000-3428.0064039

• 人工智能与模式识别 • 上一篇    下一篇

基于多模态模式迁移的知识图谱实体配图

蒋雪瑶1, 力维辰1, 刘井平2, 李直旭1, 肖仰华1   

  1. 1. 复旦大学 软件学院, 上海 200433;
    2. 华东理工大学 信息科学与工程学院, 上海 200237
  • 收稿日期:2022-02-25 修回日期:2022-04-05 发布日期:2022-05-25
  • 作者简介:蒋雪瑶(1997-),女,硕士研究生,主研方向为多模态知识图谱;力维辰,硕士研究生;刘井平,讲师、博士;李直旭、肖仰华,教授、博士。
  • 基金资助:
    上海市科技创新行动计划(19511120400)。

Entity Image Collection Based on Multi-Modality Pattern Transfer

JIANG Xueyao1, LI Weichen1, LIU Jingping2, LI Zhixu1, XIAO Yanghua1   

  1. 1. School of Software, Fudan University, Shanghai 200433, China;
    2. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2022-02-25 Revised:2022-04-05 Published:2022-05-25

摘要: 构建多模态知识图谱的核心在于为知识图谱中的实体匹配正确合适的图像。现有的实体配图方法主要将百科图谱以及图像搜索引擎作为实体候选图像的来源,但对图像数据元的应用方式比较简单,不能准确把握图像数据来源的特点,且可扩展性较差。提出一种基于多模态模式迁移的知识图谱实体配图方法,从不同类别的头部实体中抽取对应的语义模板及视觉模式迁移到同类非头部实体的图像获取过程中,其中语义模板用于构建搜索引擎检索关键词,视觉模式用于对检索结果去噪,最终为WikiData中25类共1.278×105个实体收集1.8×106幅图像。实验结果表明,与IMGpedia、VisualSem、Richpedia和MMKG这4种多模态知识图谱相比,利用该方法构建所得的知识图谱中实体对应的图像在准确性和多样性上更具优势,在下游任务链接预测中,通过引入该方法收集到的图像可使模型的预测链接准确性得到显著提升,在Hits@10的指标上取得59.74%的准确率,较对比方法提高12.7个百分点以上。

关键词: 多模态知识图谱, 符号接地, 模式迁移, 链接预测, 实体配图

Abstract: The core of constructing multi-modality knowledge graph is to ensure the correct and appropriate images match the entities in the knowledge graph.Existing entity image collection methods mainly use encyclopedias and image search engines as the source of images to serve as entity candidates;however, their application of image data elements is relatively simple in that they cannot accurately grasp the characteristics of image data sources, and their scalability is poor.Here, an entity image collection method based on multi-modality pattern transfer is proposed.The method extracts the corresponding semantic template from different types of head entities and transfers the visual mode to the image acquisition process of similar non-head entities.Semantic templates are used to build search engine search keywords, and visual modes are used to denoise the search results.Ultimately, the method collects 1.8×106 images for 1.278×105 entities in 25 categories of WikiData.The experimental results show that, compared with IMGpedia, VisualSem, Richpedia, and MMKG, the images corresponding to entities in the multi-modality knowledge graph constructed by the proposed method are more accurate with greater diversity.The accuracy of the link prediction in downstream task can be significantly improved by introducing the images collected by this method.In Hits@10, the accuracy of the index is 59.74%, which is at least 12.7 percentage points higher than that of the methods used for comparison.

Key words: multi-modality knowledge graph, symbol grounding, pattern transfer, link prediction, entity image collection

中图分类号: