作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (2): 52-59. doi: 10.19678/j.issn.1000-3428.0056867

• 人工智能与模式识别 • 上一篇    下一篇

基于先验MASK注意力机制的视频问答方案

许振雷, 董洪伟   

  1. 江南大学 物联网工程学院, 江苏 无锡 214000
  • 收稿日期:2019-12-10 修回日期:2020-01-18 出版日期:2021-02-15 发布日期:2020-02-13
  • 作者简介:许振雷(1993-),男,硕士研究生,主研方向为视频理解、数据挖掘;董洪伟,副教授、博士。
  • 基金资助:
    江苏省产学研合作项目(BY2015019-30)。

Video Question Answering Scheme Based on Prior MASK Attention Mechanism

XU Zhenlei, DONG Hongwei   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214000, China
  • Received:2019-12-10 Revised:2020-01-18 Online:2021-02-15 Published:2020-02-13

摘要: 视频问答是深度学习领域的研究热点之一,广泛应用于安防和广告等系统中。在注意力机制框架下,建立先验MASK注意力机制模型,使用Faster R-CNN模型提取视频关键帧以及视频中的对象标签,将其与问题文本特征进行3种注意力加权,利用MASK屏蔽与问题无关的答案,从而增强模型的可解释性。实验结果表明,该模型在视频问答任务中的准确率达到61%,与VQA+、SA+等视频问答模型相比,其具有更快的预测速度以及更好的预测效果。

关键词: 视频问答, 计算机视觉, 自然语言处理, 注意力机制, MASK模型

Abstract: Video Question Answering (Video QA) is one of the research hotspots in deep learning. It is widely used in security and advertising systems. In the framework of attention mechanism,this paper proposes a priori MASK attention mechanism model. The key frames of the video and the labels of the objects in the video are extracted by using the Faster R-CNN model,and three types of attention weighting are performed on them and the text features of the question. Then MASK is used to mask the answers that have nothing to do with the question,which enhances the interpretability of the model. Experimental results show that the accuracy of the proposed model reaches 61% in Video QA tasks,and the model outperforms the existing Video QA models such as VQA+ and SA+ in terms of prediction speed and prediction performance.

Key words: Video Question Answering(Video QA), computer vision, natural language processing, attention mechanism, MASK model

中图分类号: