Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (3): 113-120,127. doi: 10.19678/j.issn.1000-3428.0063613

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Weakly Supervised Object Detection Based on Double Attention Erasure and Attention Information Aggregation

SONG Pengpeng1, GONG Shengrong1,2,3, ZHONG Shan2,3, ZHOU Lifan2, FENG Huanghao2   

  1. 1. School of Computer Science and Technology, Soochow University, Suzhou 215000, Jiangsu, China;
    2. School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou 215000, Jiangsu, China;
    3. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130000, China
  • Received:2021-12-24 Revised:2022-03-16 Published:2022-08-08

基于双注意力擦除和注意力信息聚合的弱监督目标检测

宋鹏鹏1, 龚声蓉1,2,3, 钟珊2,3, 周立凡2, 凤黄浩2   

  1. 1. 苏州大学 计算机科学与技术学院, 江苏 苏州 215000;
    2. 常熟理工学院 计算机科学与工程学院, 江苏 苏州 215000;
    3. 吉林大学 符号计算与知识工程教育部重点实验室, 长春 130000
  • 作者简介:宋鹏鹏(1997—),男,硕士研究生,主研方向为深度学习、计算机视觉;龚声蓉,教授、博士;钟珊、周立凡,副教授、博士;凤黄浩,博士。
  • 基金资助:
    国家自然科学基金(61972059,42071438);江苏省自然科学基金(BK20191474,BK20191475,BK20161268);吉林大学符号计算与知识工程教育部重点实验室项目(93K172021K01,93K172017K18)。

Abstract: Weakly Supervised Object Detection(WSOD) is mainly accomplished by multi-instance detection networks.However, the application of classification feature extraction networks in such methods causes the detection results to readily converge to the most discriminative local object regions, especially those of non-rigid objects.To address this problem, this paper proposes an end-to-end weakly supervised detection framework based on Double Attention Erasure(DAE) and Attentional Information Aggregation(AIA).DAE aims to generate masks and erase attention regions because including both yields the most discriminative local foreground and background regions.As a result, the most discriminative region of an object can be expanded, and the entire object region is better captured.In addition, to accurately locate the regions for different objects and generate erasure masks, this study employs AIA, which aggregates global and local features in a channel and then further improves the detection accuracy by introducing spatial dependencies.DAE and AIA can synergistically achieve improved weakly supervised detection performance.In this study, extensive experiments are conducted using the PASCAL VOC 2007 and VOC 2012 datasets, on which the proposed method achieves detection accuracies of 50.5% and 47.4%, respectively.The experimental results demonstrate that the proposed method improves the detection accuracy on some non-rigid objects by approximately 5%-20% compared to that of a benchmark model.

Key words: Weakly Supervised Object Detection(WSOD), erasure strategy, attention mechanism, non-rigid objects, deep learning

摘要: 现有的弱监督检测方法主要采用多示例检测网络,但在这些方法中应用分类特征提取网络易使目标尤其是非刚性目标的检测结果收敛到目标最显著局部区域。提出一种基于双注意力擦除和注意力信息聚合的端到端的弱监督检测框架DAENet。双注意力擦除模块的目的在于擦除生成的最显著性局部前景区域和部分背景区域,以此来扩展目标显著性区域,使网络能够尽可能地关注目标整体,从而更好地捕获目标整体区域。此外,为准确定位不同目标区域并精确生成注意力擦除掩码,提出注意力信息聚合模块,该模块可提取通道的全局特征和局部特征,并引入空间依赖性进一步提高检测精度。通过将双注意力擦除和注意力信息聚合进行协同工作,从而更好地提高弱监督检测性能。在PASCAL VOC 2007和VOC 2012数据集上的实验结果表明,DAENet框架在两个数据集上的检测精度分别达到50.5%和47.4%,相比基准模型,在部分非刚性目标上的检测精度提高了约5%~20%。

关键词: 弱监督目标检测, 擦除策略, 注意力机制, 非刚性目标, 深度学习

CLC Number: