作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (7): 179-188. doi: 10.19678/j.issn.1000-3428.0065253

• 图形图像处理 • 上一篇    下一篇

基于改进SSD算法的小目标检测

吴珊, 周凤*   

  1. 贵州大学 计算机科学与技术学院 公共大数据国家重点实验室, 贵阳 550025
  • 收稿日期:2022-07-15 出版日期:2023-07-15 发布日期:2023-07-14
  • 通讯作者: 周凤
  • 作者简介:

    吴珊(1996—),女,硕士研究生,主研方向为图像处理、目标检测

  • 基金资助:
    贵州省科技计划项目(黔科合战略找矿[2022]ZD001)

Small Target Detection Based on Improved SSD Algorithm

Shan WU, Feng ZHOU*   

  1. State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
  • Received:2022-07-15 Online:2023-07-15 Published:2023-07-14
  • Contact: Feng ZHOU

摘要:

SSD属于经典的单阶段目标检测算法,通过在不同卷积层上生成6个尺度的特征图进行预测,但由于其存在浅层特征图的非线性程度不够、语义信息缺乏等问题,且小目标所含像素少,导致小目标在经过多次卷积操作后信息丢失严重,小目标的检测准确率远低于大中尺度目标的检测准确率。提出多尺度特征与混合注意力机制融合的策略,在替换原骨干网络的基础上构建自下而上的下采样路径和自上而下的上采样路径。具体来说,下采样路径使用自注意力机制自适应地增强浅层空间特征和深层语义特征。在上采样路径中,通过融合3个尺度特征图的局部信息和全局信息,增强深层特征的语义信息,并引入空间注意力机制和坐标注意力机制以丰富待融合特征图的语义信息和位置信息,同时使用自注意力增强模块增强融合特征的表达能力。实验结果表明,当输入图像大小为512×512像素时,所提改进算法在PASCAL VOC和HRRSD数据集上的平均精度均值分别为84.6%、89.6%,与SSD算法相比分别提高了6.1、8.8个百分点。

关键词: 深度学习, 注意力机制, 小目标检测, 特征增强, 特征融合

Abstract:

SSD is a classical single-stage target detection algorithm that makes prediction by generating six scale feature maps on different convolutional layers. However, it suffers from the problems of insufficient nonlinearity and lack of semantic information in shallow feature maps, and small targets contain few pixels and lose significant information after multiple convolution operations, which leads to the detection accuracy of small targets being much lower than that of large and medium-scale targets. A strategy of fusing multi-scale features with the hybrid attention mechanism is proposed, and a bottom-up downsampling path and top-down upsampling path are constructed by replacing the original backbone network.Specifically, the downsampling path adaptively enhances shallow spatial features and deep semantic features using the self-attention mechanism.In the upsampling path, the semantic information of the deep features is enhanced by fusing the local and global information of the feature maps at three scales, and spatial and coordinate attention mechanisms are introduced to enrich the semantic and position information of the feature maps to be fused, respectively, while a self-attention enhancement module is used to further enhance the expression capability of the fused features. Experimental results show that when the input image size is 512×512 pixels, the mean Average Precision(mAP) of the proposed improved algorithm on the PASCAL VOC and HRRSD data sets were 84.6% and 89.6%, respectively, which increase 6.1 and 8.8 percentage points respectively compared with the SSD algorithm.

Key words: deep learning, attention mechanism, small target detection, feature enhancement, feature fusion