作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (2): 148-157. doi: 10.19678/j.issn.1000-3428.0069729

• 计算机视觉与图形图像处理 • 上一篇    

基于多层次特征融合和注意力机制的无人机图像小目标检测算法

张信佳, 王芳   

  1. 燕山大学理学院, 河北 秦皇岛 066000
  • 收稿日期:2024-04-11 修回日期:2024-09-05 发布日期:2024-11-06
  • 作者简介:张信佳,男,硕士研究生,主研方向为深度学习、目标检测;王芳(通信作者),副教授、博士。E-mail:wangfang@ysu.edu.cn
  • 基金资助:
    国家自然科学基金(62073234);河北省自然科学基金(F2020203105);河北省高等学校科学技术研究项目(ZD2022012)。

UAV Image Small Object Detection Algorithm Based on Multi-layer Feature Fusion and Attention Mechanism

ZHANG Xinjia, WANG Fang   

  1. School of Science, Yanshan University, Qinhuangdao 066000, Hebei, China
  • Received:2024-04-11 Revised:2024-09-05 Published:2024-11-06

摘要: 无人机(UAV)航拍图像中的目标通常具有尺度密集、易被遮挡且多为小目标等特点,这导致检测过程中容易出现漏检和误检。为应对上述挑战,基于YOLOv5s提出了针对小目标检测的SNA-YOLOv5s算法。首先,引入空间深度转换卷积(SPD-Conv)模块替换原模型的跨步卷积层,避免细节信息丢失,增强小目标特征提取能力;其次,设计新型平均快速空间金字塔池化(AGSPPF)模块,引入平均池化操作缓解池化层在提取特征信息的同时会导致部分信息丢失的问题,提升模型的特征提取能力;再次,新增针对小目标的大尺度检测分支,捕捉浅层特征中丰富的细节信息,提升模型对小目标的检测能力;最后,将归一化注意力机制(NAM)嵌入骨干网络,对特征信息进行加权处理,抑制无效的特征信息。在VisDrone2019数据集和NWPU VHR-10数据集上的训练测试结果表明,该算法的均值平均精度(mAP)分别达到了42.3%和96.5%,与基线模型YOLOv5s相比分别提高了8.4和2.6百分点。通过与其他基于深度学习的主流模型对比实验,进一步验证了该模型的鲁棒性和精确性。

关键词: YOLOv5s模型, 小目标检测, 空间深度转换卷积, 空间金字塔池化, 归一化注意力机制

Abstract: Object detection in Unmanned Aerial Vehicle (UAV) aerial photography images is prone to incorrect or missed detections when the target is small, obstructed, or characterized by dense scales. To address the above challenges, this paper proposes the SNA-YOLOv5s algorithm for small target detection, which is based on YOLOv5s. First, the strided convolution layer in the original model is replaced with the Spatial Depth Transformation Convolution (SPD-Conv) module, eliminating the problem of detail loss caused by strided convolution operations and enhancing the model's ability to extract features from small objects. Second, a novel Average Pyramid Pooling-Fast (AGSPPF) module is designed, and an average pooling operation layer is introduced to address the issue of information loss that occurs while extracting feature information, thereby improving the model's feature extraction capability. Third, a new large-scale detection branch specifically for small targets is added to capture rich details in shallow features and enhance the detection capability for small targets. Finally, the Normalized Attention Mechanism (NAM) is embedded in the backbone network, where feature information is weighted to suppress invalid feature information. The proposed algorithm is trained and tested on the VisDrone2019 and NWPU VHR-10 datasets, on which it achieves mean Average Precision (mAP) of 42.3% and 96.5%, respectively, which is 8.4 and 2.6 percentage points higher than that of the baseline YOLOv5s model. The robustness and accuracy of the proposed model are validated by comparisons with other mainstream deep learning models.

Key words: YOLOv5s model, small target detection, Spatial Depth Transformation Convolution (SPD-Conv), Spatial Pyramid Pooling (SPP), Normalized Attention Mechanism (NAM)

中图分类号: