改进RT-DETR的航拍图像小目标检测算法

doi:10.19678/j.issn.1000-3428.0252661

摘要/Abstract

摘要： 在轻小型无人机图像目标检测任务中，常面临检测精度低、背景复杂、目标尺度变化大、分布密集以及模型参数量较大的问题，因此提出一种基于改进 RT-DETR无人机目标检测算法。首先，使用热传导模块HeatBlock和空间选择注意力模块LskBlock改进C2f得到C2f-Heat-Lsk模块，然后使用C2f-Heat-Lsk模块和C2f模块来重新设计RT-DETR主干网络，提高主干网络对小目标的特征提取能力并减少模型参数量。其次，提出特征融合结构SOFEP替代原网络的特征金字塔，缓解小目标细节信息丢失的问题，并增强小目标的特征表示。最后，结合Focaler-IoU和MPDIoU两种损失函数来构造Focaler-MPDIoU损失函数，提高边界框的回归精度进而减少模型的漏检率。实验结果表明，在VisDrone测试集上，改进模型参数量较RT-DETR降低16.9%，mAP0.5和mAP0.5:0.9指标分别提升2.6%和1.9%，在DOTAv1.0和HIT-UAV 数据集上均优于RT-DETR算法。改进模型在保持较小参数量的同时，提高了检测精度，满足了无人机航拍图像小目标检测的应用需求。

Abstract: In lightweight small UAV image object detection tasks,there are common challenges such as low detection accuracy, complex backgrounds, large variations in target scale, dense target distribution, and a relatively large number of model parameters. Therefore, this paper proposes a novel improved RT-DETR object UAV object detection algorithm. First, an enhanced C2f-Heat-Lsk module is developed through integrating the HeatBlock thermal conduction module and LskBlock spatial selective attention mechanism into the C2f structure. This modified module collaborates with the original C2f module to redesign the RT-DETR backbone network, which improves spatial feature extraction while reducing model parameters Second, a novel feature fusion structure SOFEP replaces the original feature pyramid to mitigate detail loss in small objects and enhance their feature representation. Third, a combined Focaler-MPDIoU loss function is constructed by integrating Focaler-IoU and MPDIoU loss mechanisms, which improves bounding box regression accuracy and reduces miss detection rates. Experimental results on the VisDrone test set show that the improved model reduces parameter count by 16.9% compared to RT-DETR, while achieving improvements of 2.6% in mAP0.5 and 1.9% in mAP0.5:0.9. The model also outperforms RT-DETR on the DOTAv1.0 and HIT-UAV datasets. These advancements demonstrate that the proposed method achieves higher detection accuracy with reduced computational complexity, effectively meeting the requirements for small object detection in UAV aerial images.

田红鹏, 李志强, 杨赛. 改进RT-DETR的航拍图像小目标检测算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252661.

TIAN Hongpeng, LI Zhiqiang, YANG Sai. An Improved Algorithm for Small Object Detectionin UAV Aerial Images Based on RT-DETR[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252661.

参考文献

[1]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J].Advances in neural information processing systems,2017,30:5998-6008.
[2]KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J].Advances in neural information processing systems,2012,25:1097-1105.
[3]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]//International Conference on Learning Representations,2021:1-21.
[4]CARION N, MASSA F, SYNNAEVE G, et al. End-to- end object detection with transformers[C]//European conference on computer vision. Cham:Springer International Publishing,2020:213-229.
[5]ZHU Xizhou, SU Weijie, LU Lewei,et al. DeformableDETR: Deformable transformers for end-to-end objectdetection[C]//International Conference on LearningRepresentations.2021:1-16.
[6]MENG Depu, CHEN Xiaokang, FAN Zejia, et al.Conditional DETR for fast training convergence [C]//International Conference on Computer Vision.2021:3651-3660.
[7]ZHAO Yian, LV Wenyu, XU Shangliang, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[8]CAO Shihai, WANG Ting, LI Tao, et al. UAV small target detection algorithm based on an improved YOLOv5s model[J].Journal of visual communication and image representation,2023,97(Dec.):1.1-1.9.DOI:10.1016/j.jvcir. 2023.103936.
[9]TANG Xiangyan, RUAN Chengchun, LI Xiulai, et al. MSC-YOLO:ImprovedYOLOv7 Based on Multi-Scale Spatial Context for Small Object Detection in UAV- View[J].Computers, Materials and Continua, 2024, 79 (4):983-1003.DOI:10.32604/cmc.2024.047541.
[10]XU Wenyuan, CUI Chuang, JI Yongcheng, et al. YOLOv8-MPEB small target detection algorithm based on UAVimages[J].Heliyon,2024,10(8):18.DOI:10.1016/j.heliyon.2024.e29501,ISSN 2405-8440.
[11]HUI Yanming, WANG Jue, LI Bo. DSAA-YOLO: UAV remote sensing small target recognition algorithm for YOLOV7 based on dense residual super-resolution and anchor frame adaptive regression strategy[J].Journal of King Saud University-Computer and Information Sciences,2024,36(1).DOI:10.1016/j.jksuci.2023.101863.
[12]江志鹏,王自全,张永生,等.基于改进Deformable DETR的无人机视频流车辆目标检测算法[J].计算机工程与科学,2024,46(01):91-101. JIANG Zhipeng, WANG Ziquan, ZHANG Yongsheng, et al. A vehicle object detection algorithm in UAV video stream based on improved Deformable DETR[J].Computer Engineering and Science,2024,46(01):91-101.
[13]王思宇,卢瑞涛,黄攀,等.基于Swin Transformer和注意力机制的红外无人机检测算法[J].航空科学技术,2024,35(02):39-46.DOI:10.19452/j.issn1007-5453.2024.02.005. Wang Siyu, Lu Ruitao, Huang Pan, et al. Infrared UAV Detection Algorithm Based on Swin Transformer and Attention Mechanism[J].Aeronautical Science and Technology,2024,35(02):39-46.DOI:10.19452/j.issn1007-5453.2024.02.005.
[14]WANG Jinyu, JIN Lijun, LI Yingna, et al. Application of end-to-end perception framework based on boosted DETR in UAV inspection of overhead transmission lines [J]. Drones(2504-446X),2024,8(10).DOI:10.3390/drones8100545.
[15]ZHAO Li, WANG Jianlong, CHEN Yunhao, et al. IST-DETR:Improved DETR for Infrared Small Target Detection[J].IEEE Access,12[2025-08-01].DOI:10.1109/ ACCESS.2024.3491104.
[16]毛清华,郭文瑾,苏毅楠,等.改进RT-DETR的煤矿刮板输送机链条故障智能识别方法研究[J/OL].煤炭科学技术, 1-12[2025-03-28].http://kns.cnki.net/kcms/detail/11.2402. td.20241014.1415.004.html. MAO Qinghua, GUO Wenjin, SU Yinan, et al. Research on intelligent identification of chain failure for mining scraper conveyor based on improved RT-DETR algorithm[J/OL].Coal Science and Technology, 1-12 [2025-03-28].http://kns.cnki.net/kcms/detail/11.2402.td.20241014.1415.004.html.
[17]TERVEN J, CÓRDOVA-ESPARZA D M, ROMERO -GONZÁLEZ J A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas[J].Machine Learning and Knowledge Extraction,2023,5(4):1680-1716.DOI:10.3390/make5040083.
[18]WANG Zhaozhi, et al. vHeat:Building vision models upon heat conduction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2025:9707-9717.
[19]LI Yuxuan, et al. Large selective kernel network for remote sensing object detection.In: Proceedings of the IEEE/CVF international conference on computer vision. 2023. p. 16794-16805.
[20]SUNKARA,Raja;LUO,Tie.No more strided convolutions or pooling:A new CNN building block for low-resolution images and small objects. In: Joint European conference on machine learning and knowledge discovery in databases. Cham: Springer Nature Switzerland, 2022. p.443-459.
[21]HOU Qibin, ZHOU Daquan, FENG Jiashi. Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2021.p.13713-13722.
[22]ZHANG Hao, ZHANG Shuaijie. Focaler-iou:More focused intersection over union loss.arXiv preprint arXiv: 2401. 10525,2024.
[23]MA Siliang, XU Yong. Mpdiou:a loss for efficient and accurate bounding box regression. arXiv preprint arXiv:2307.07662,2023.v [24]DU Dawei, ZHU Pengfei, WEN Longyin, et al. VisDrone -DET2019:The vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF international conference on computer vision workshops. 2019:0-0.
[25]SUO Jiashun, WANG Tianyi, ZHANG Xingzhou, et al.HIT-UAV:A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection [J]. ScientificData,2023,10(1):227.
[26]XIA Guisong S, BAI Xiang, DING Jian, et al. DOTA:A large-scale dataset for object detection in aerial images [C]//Proceedings of the IEEE conference on computer vision and pattern recognition IEEE,2018: 3974 -39835
[27]CHEN Yifei, ZHANG Chenyan, CHEN Ben, et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases[J].Computers in Biology and Medicine, 2024, 170.DOI:10.1016/j.compbiomed.2024.107917.
[28]YANG Zhiqiang, GUAN Qiu, ZHAO Keer, et al. Multi-Branch auxiliary fusion YOLO with re- parameterization heterogeneous convolutional for accurate object detection[J].arXiv:2407.04381,2024.
[29]TAN Mingxing, PANG Ruoming, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),USA,June 13-19.Piscataway, NJ:IEEE, 2020:1577–1586.
[30]ZHANG Hao, ZHANG Shuaijie. Shape-iou:More accurate metric considering bounding box shape and scale[J].arXiv preprint arXiv:2312.17663, 2023.
[31]ZHANG Hao, Xu Cong, Zhang Shuaijie. Inner-IoU:more effective intersection over union loss with auxiliary bounding box[J].arXiv preprint arXiv:2311.02877,2023.
[32]WANG Chien-Yao, YEH I-Hau, LIAO Hong-YuanMark. YOLOv9: Learning what you want to learn using programmable gradient information.Proceedings of the18th European Conference on Computer Vision. Milan: Springer, 2024. 1–21.
[33]Wang Ao, Chen Hui, Liu Lihao, et al. YOLOv10: Real-time end-to-end object detection[J]. arXiv preprint arXiv: 2405. 14458, 2024.
[34]李彬,李生林.改进 YOLOv11n 的无人机小目标检测算法[J].计算机工程与应用, 2025, 61(7): 96-104. LI Bin, LI ShengLin. Improved YOLOv11n small object detection algorithm in UAV View[J]. Computer Engineeringand Applications,2025,61(7):96-104.
[35]TIAN Yunjie, YE Qixiang X, DAVID DOERMANN. YOLOv12:Attention-centric real-time object detectors[J]. arXiv preprint arXiv:2502.12524,2025.

选择文件类型/文献管理软件名称

选择包含的内容