作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (9): 230-238. doi: 10.19678/j.issn.1000-3428.0062268

• 图形图像处理 • 上一篇    下一篇

结合外部知识库与适应性推理的场景图生成模型

王旖旎, 高永彬, 万卫兵, 杨淑群, 郭茹燕   

  1. 上海工程技术大学 电子电气工程学院, 上海 201600
  • 收稿日期:2021-08-05 修回日期:2021-10-15 发布日期:2021-10-29
  • 作者简介:王旖旎(1995—),女,硕士研究生,主研方向为计算机视觉、图像处理;高永彬、万卫兵,副教授;杨淑群,教授;郭茹燕,硕士研究生。
  • 基金资助:
    国家自然科学基金青年科学基金项目(61802253)。

Scene Graph Generation Model Combined with External Knowledge Base and Adaptive Reasoning

WANG Yini, GAO Yongbin, WAN Weibing, YANG Shuqun, GUO Ruyan   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China
  • Received:2021-08-05 Revised:2021-10-15 Published:2021-10-29

摘要: 为在场景图生成网络中获得重要的上下文信息,同时减少数据集偏差对场景图生成性能的影响,构建一种基于外部知识库与适应性推理的场景图生成模型。利用结合外部知识库的目标检测模块引入语言先验知识,提高实体对关系类别检测的准确性。设计基于Transformer架构的上下文信息提取模块,采用两个Transformer编码层对候选框和实体对关系类别进行处理,并利用自注意力机制分阶段实现上下文信息合并,获取重要的全局上下文信息。构建特征特殊融合的适应性推理模块,通过软化分布并根据实体对的视觉外观进行适应性推理关系分类,缓解实体对关系频率的长尾分布问题,提升模型推理能力。在VG数据集上的实验结果表明,与MOTIFS模型相比,该模型在谓词分类、场景图分类和场景图生成子任务上的Top-100召回率分别提升了1.4、4.3、7.1个百分点,对于多数关系类别具有更好的场景图生成效果。

关键词: 场景图, 视觉关系, 外部知识库, 注意力机制, 适应性推理

Abstract: To obtain better contextual information in the Scene Graph Generation(SGG) network while reducing the impact of dataset bias, this study proposes a SGG model based on an external knowledge base and adaptive reasoning.First, the proposed model uses a target-detection module combined with an external knowledge base to provide the model with linguistic priori knowledge to improve the accuracy of relationship-category detection for entity pairs.Second, the model designs a transformer architecture-based context information extraction module to process the candidate box and entity pair relationship labels through two transformer-coding layers, and merge the context information in stages using the self-attention mechanism to obtain more meaningful global context information.Finally, as the relationship frequencies are affected by the long-tail distribution, the model designs a feature-specific fusion of adaptive inference modules to alleviate this problem by softening the distribution and by adaptively reasoning about relationship classification based on the visual appearance of entity pairs.Experimental results on the Visual Genome (VG) dataset show that using the proposed model, Top-100 Recall(Recall@100, R@100) on Predicate Classification(PredCls), Scene Graph Classification(SGCls), and Scene Graph Generation(SGGen) subtasks is increased by 1.4, 4.3, and 7.1 percentage points, respectively, compared with the MOTIFS model.Furthermore, the proposed model achieves better SGG effect for most relationship categories.

Key words: scene graph, visual relationship, external knowledge base, attention mechanism, adaptive reasoning

中图分类号: