作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

改进YOLOv12n的航拍图像小目标检测算法

  • 发布日期:2026-04-14

Improved YOLOv12n for Small Object Detection in Aerial Images

  • Published:2026-04-14

摘要: 针对无人机航拍图像中小目标像素占比低、尺度波动剧烈且分布密集的问题,提出一种基于YOLOv12n改进的算法SAM-YOLOv12n。在主干网络中设计了双注意力耦合C2f小目标模块(Dual-Attention Coupled C2f for Small Object,DA-C2f-S),通过引入多层特征提取结构与双重注意力机制,有效增强了对小目标边缘及纹理等细微特征的捕捉能力;构建了多尺度融合卷积模块(Multi-Scale Fusion Convolution,MSFConv),以膨胀深度可分离卷积(Dilated Depthwise Separable Convolution,DDSConv)为核心设计不同膨胀率的差异化分支,实现局部细节与全局上下文特征的协同建模,弥补单一尺度感受野的局限,更好适配航拍小目标的尺度波动特性;重构检测头结构,保留高分辨率分支并移除大目标检测头,使计算资源更集中于密集小目标区域。在VisDrone2019数据集上实验结果表明,改进方法在mAP@0.5和mAP@0.5:0.95上分别较基线YOLOv12n提升9.9%和7.2%,验证了其在复杂航拍场景下对小目标检测的有效性。在TinyPerson超小目标及HIT-UAV红外航拍数据集上的泛化实验,验证了改进方法在不同航拍场景下的跨域适配能力。其核心优势在于有效平衡了检测精度、模型复杂度与推理效率,可为无人机航拍目标实时检测任务提供可靠的技术支撑。

Abstract: Aiming at the problems of limited pixel resolution, significant scale variation, and dense distribution of small objects in UAV-aerial images, an improved algorithm named SAM-YOLOv12n based on YOLOv12n is proposed. In the backbone network, a Dual-Attention Coupled C2f for Small Objects (DA-C2f-S) module is designed. By introducing a multilevel feature extraction structure and a dual attention mechanism, the module effectively enhances the ability to capture fine features such as edges and textures of small objects. A Multi-Scale Fusion Convolution (MSFConv) module is constructed, which takes Dilated Depthwise Separable Convolution (DDSConv) as the core and designs differentiated branches with various dilation rates. This achieves cooperative modeling of local details and global contextual features, compensating for the limitations of a single-scale receptive field, and better adapting to the scale fluctuation characteristics of small aerial objects. Experimental results on the VisDrone2019 dataset show that the improved method achieves improvements of 9.9% in mAP@0.5 and 7.2% in mAP@0.5:0.95 compared with the baseline YOLOv12n, validating its effectiveness for small object detection in complex aerial scenarios. Generalization experiments conducted on the TinyPerson ultra-small object dataset and HIT-UAV infrared aerial dataset verify the cross-domain adaptability of the proposed method across different aerial scenes. Its core advantage lies in effectively balancing detection accuracy, model complexity, and inference efficiency, providing reliable technical support for real-time object detection tasks in UAV aerial imaging.