作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 242-250. doi: 10.19678/j.issn.1000-3428.0066724

• 图形图像处理 • 上一篇    下一篇

融合注意力与特征金字塔的小尺度目标检测算法

圣文顺*(), 余熊峰, 林佳燕, 陈欣   

  1. 南京工业大学浦江学院, 江苏 南京 211200
  • 收稿日期:2023-01-11 出版日期:2024-01-15 发布日期:2023-03-29
  • 通讯作者: 圣文顺
  • 基金资助:
    江苏省青蓝工程(苏教师函[2021]11号); 国家自然科学基金(61571222); 江苏省高校自然科学基金面上项目(19KJD520005)

Small-Scale Object Detection Algorithm Integrating Attention and Feature Pyramids

Wenshun SHENG*(), Xiongfeng YU, Jiayan LIN, Xin CHEN   

  1. Pujiang Institute, Nanjing Tech University, Nanjing 211200, Jiangsu, China
  • Received:2023-01-11 Online:2024-01-15 Published:2023-03-29
  • Contact: Wenshun SHENG

摘要:

针对Faster R-CNN算法对于小尺寸目标以及遮挡或截断物体检测能力不足的问题,提出一种融合CBAM注意力机制和特征金字塔结构的改进Faster R-CNN算法。为重点聚焦特征图像局部高效信息,在特征提取网络中融入CBAM机制,减少无效目标的干扰,提升面对遮挡或截断物体的检测能力。引入特征金字塔网络结构,联结高层与底层特征数据,获得高分辨率、强语义数据,从而增强小目标物体的检测效果。为缓解梯度消失现象以及减少超参数规模,使用表达能力较强的倒残差VS-ResNet网络替换VGG16网络,VS-ResNet网络在原有ResNet 50基础上修改了部分层次结构,加入辅助分类器,设计倒残差和组卷积方式,使激活函数信息在高维环境中完整保留,提高检测准确率。采用重置候选框分值计算方法弥补非极大值抑制算法误消除重叠检测框的缺陷。实验结果表明,相比VGG16,VS-ResNet在CIFAR-10数据集上的正确率提高2.97个百分点,该算法在Pascal VOC 2012数据集上的目标检测mAP值为76.2%,比原始Faster R-CNN算法的mAP值提高了13.9个百分点。

关键词: 深度学习, 注意力机制, 特征金字塔, 小目标检测, 截断物体检测

Abstract:

A modified Faster R-CNN algorithm is proposed to address the problem of poor detection ability for small-scale objects and occluded or truncated objects, combining the CBAM mechanism and feature pyramid structure. To focus on the efficient use of local information in feature images, the CBAM mechanism is integrated into the feature extraction network to reduce the interference of invalid targets and improve the detection ability, notwithstanding occluded or truncated objects. This introduces a Feature Pyramid Network(FPN) structure to connect high- and low-level feature data, obtaining high-resolution and strong semantic data, thereby enhancing the detection effect of small objects. To alleviate the phenomenon of gradient vanishing and reduce the scale of hyperparameters, the commonly used VGG16 network is replaced with a strong expressive ability of the inverse residual VS-ResNet network. VS-ResNet modifies some hierarchical structures based on the original ResNet 50, adds auxiliary classifiers, designs inverse residual and group convolution methods, such that the activation function information is fully preserved in high-dimensional environments, and improves detection accuracy. The reset candidate box score calculation method is used to compensate for the defect of the Non-Maximum Suppression(NMS) algorithm in mistakenly eliminating overlapping detection boxes. The experimental results demonstrate that compared to VGG16, VS-ResNet has a 2.97 percentage points improvement in accuracy on the CIFAR-10 dataset. The target detection mAP value of the proposed algorithm on the Pascal VOC 2012 dataset is 76.2%, which is 13.9 percentage points higher than that of the original Faster R-CNN algorithm.

Key words: deep learning, attention mechanism, feature pyramid, small object detection, truncated object detection