作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (8): 260-270. doi: 10.19678/j.issn.1000-3428.0058244

• 图形图像处理 • 上一篇    下一篇

基于小样本学习和语义信息的图像描述模型

王会勇, 卢超, 张晓明   

  1. 河北科技大学 信息科学与工程学院, 石家庄 050000
  • 收稿日期:2020-05-06 修回日期:2020-07-13 发布日期:2020-07-21
  • 作者简介:王会勇(1980-),男,讲师、博士,主研方向为模式识别、机器学习、语义Web;卢超,硕士研究生;张晓明(通信作者),教授、博士。
  • 基金资助:
    河北省自然科学基金(F2018208116)。

Image Caption Model Based on Few-Shot Learning and Semantic Information

WANG Huiyong, LU Chao, ZHANG Xiaoming   

  1. School of Information Science and Technology, Hebei University of Science and Technology, Shijiazhuang 050000, China
  • Received:2020-05-06 Revised:2020-07-13 Published:2020-07-21

摘要: 为克服传统图像描述模型只能描述已知对象的问题,结合小样本目标检测器和知识图谱,提出一种新的图像描述模型。小样本目标检测器能够检测出描述模型无法识别的对象,并且给出对象的名称,利用知识图谱提供对象的背景知识,结合对象信息,通过引入注意力机制引导模型选取合适的单词,进而生成包含这些对象的描述语句。实验结果表明,该模型的平均F1值较基线模型提升了6.6个百分点,而且所生成的描述语句的质量在SPICE标准上提高了2.0个百分点,证明该模型所采用的方法是有效的。

关键词: 图像描述, 小样本学习, 知识图谱, 目标检测, 循环神经网络

Abstract: In order to overcome the problem that traditional image caption models can only describe known objects, a new image description model is proposed by combining few-shot object detectors and knowledge graphs. The few-shot object detector can detect objects that cannot be described by the caption model, and provide names of the objects and the knowledge graphs provide background knowledge of these objects. Combining those information, an attention mechanism is constructed to guide the model to select the appropriate words. Then, a description sentence containing these objects is generated. The experimental results show that the average F1 value of the model is improved by 6.6 percentage points compared with the baseline model, and the quality of the generated description sentences is improved by 2.0 percentage points on the SPICE metric, which indicates the effectiveness of the method adopted by the model.

Key words: image caption, few-shot learning, knowledge graph, object detection, Recurrent Neural Network(RNN)

中图分类号: