A Multi-Scale Object Detection Algorithm Oriented to Autonomous Driving

doi:10.19678/j.issn.1000-3428.0252697

Abstract

Abstract: Object detection for autonomous driving perception aims to locate and identify traffic participants such as motor vehicles, non-motor vehicles, and pedestrians within onboard camera views in real time, providing accurate input for the environmental perception module to support decision-making and control in autonomous driving systems. The perception system suffers from false and missed detection rates due to complex road backgrounds, diverse object shapes, and large scale variations. Specific challenges include low accuracy in detecting deformed objects, insufficient multi-scale detection, and weak global perception. To address these issues, an improved algorithm named YOLOv8-DDL based on YOLOv8n is proposed. First, deformable attention is introduced to improve the C2f module in the backbone network, which dynamically learns feature offsets to enhance the capture capability for various object shapes in traffic scenes, improving the model's adaptability to complex spatial distributions and effectively reducing false detections. Second, large separable kernel attention is integrated to enhance the spatial pyramid pooling fast module, expanding the receptive field through large-kernel convolution to strengthen global context modeling and robustness in complex backgrounds. Finally, a dynamic multi-scale adaptive fusion module and a dynamic feature pyramid network are designed to reconstruct the neck network, dynamically fusing high-level and low-level features to enhance multi-scale feature representation and improve multi-scale object detection performance. Experimental results on the public SODA10M dataset show that compared to YOLOv8n, YOLOv8-DDL improves precision, recall, F1-score, and mean average precision by 5.9%, 1.3%, 3%, and 1.5%, respectively. Additional validation on the public BDD100K dataset confirms improvements of 2%, 0.6%, 1%, and 2% in these metrics, respectively.

摘要： 面向自动驾驶感知的道路目标检测旨在实时定位与识别车载视觉范围内的机动车、非机动车及行人等交通参与者，为环境感知模块提供精准输入，支撑自动驾驶系统的决策与控制。由于道路场景背景复杂、目标形态多样且尺度差异大，导致感知系统的误检率与漏检率较高。针对形变目标检测精度低、多尺度目标检测不足、全局感知能力弱的问题，提出基于YOLOv8n的改进算法YOLOv8-DDL。首先，引入可变形注意力DAttention改进骨干网络的C2f，通过动态学习特征偏移，增强对交通场景中多种形态目标的捕捉能力，提升模型对复杂空间分布的适应性，有效减少错检。其次，融合大核可分离注意力LSKA改进SPPF，通过大核卷积扩大感受野，增强模型的全局上下文建模能力，提升复杂背景下的鲁棒性。最后，设计动态多尺度自适应融合模块DMAF及动态特征金字塔网络Dynamic-FPN重构颈部网络，通过动态融合高低层特征，增强模型在多尺度特征融合中的表达能力，改善多尺度目标检测效果。在公开数据集SODA10M上进行实验，结果表明，相较YOLOv8n，YOLOv8-DDL在精确率P、召回率R、F1值、平均精确度mAP@0.5上分别提升了5.9%、1.3%、3%、1.5%；在公开数据集BDD100K上进行辅助验证，精确率P、召回率R、F1值、平均精确度mAP@0.5分别提升了2%、0.6%、1%、2%。

HUANG Yuqi, YANG Xiaoxia, YANG Ronghao , LIAO Fangzhou, YAN Le, GUO Junqiang, LI Minghan. A Multi-Scale Object Detection Algorithm Oriented to Autonomous Driving[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252697.

黄玉琦, 杨晓霞, 杨容浩, 廖方舟, 严乐, 郭俊强, 李明涵. 面向自动驾驶的多尺度目标检测算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252697.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252697

References

[1] WEN L, JO K. Deep learning-based perception systems for autonomous driving: A comprehensive survey[J]. Neurocomputing, 2022, 489: 255-270. DOI:10.1016/j.neucom.2021.08.155.
[2] 王宏志, 宋明轩, 程超, 等. 基于改进YOLOv5算法的道路目标检测方法[J]. 吉林大学学报(工学版), 2024, 54(9): 2658-2667. DOI:10.13229/j.cnki.jdxbgxb.20221461. WANG H Z, SONG M X, CHENG C, et al. Road object detection method based on improved YOLOv5 algorithm[J]. Journal of Jilin University (Engineering and Technology Edition), 2024, 54(9): 2658-2667. DOI:10.13229/j.cnki.jdxbgxb.20221461.
[3] HU Y, SUN L, LI B, et al. End-to-end autonomous driving: Challenges and frontiers[J]. IEEE Transactions on Intelligent Vehicles, 2023, 9(3): 2250-2265. DOI:10.1109/TIV.2023.3251704.
[4] MITTAL P. A comprehensive survey of deep learning-based lightweight object detection models for edge devices[J]. Artificial Intelligence Review, 2024, 57(9): 242-242. DOI:10.1007/s10462-024-10877-1.
[5] CHEN R C, DEWI C, ZHUANG Y C, et al. Contrast limited adaptive histogram equalization for recognizing road marking at night based on YOLO models[J]. IEEE Access, 2023, 11: 92926-92942. DOI:10.1109/ACCESS.2023.3309410.
[6] AGRAWAL P, GIRSHICK R, MALIK J. Analyzing the performance of multilayer neural networks for object recognition[C]//European Conference on Computer Vision (ECCV). California, USA: Springer, 2014: 329-344.
[7] ARORA N, KUMAR Y, KARKRA R, et al. Automatic vehicle detection system in different environment conditions using fast R-CNN[J]. Multimedia Tools and Applications, 2022, 81(13): 18715-18735. DOI:10.1007/s11042-022-12347-8.
[8] LI X M, XIE Z J, DENG X, et al. Traffic sign detection based on improved faster R-CNN for autonomous driving[J]. Journal of Supercomputing, 2022, 78(6): 7982-8002. DOI:10.1007/s11227-021-04230-4.
[9] GAWANDE U, HAJARI K, GOLHAR Y. SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection[J]. Applied Intelligence, 2022, 52(9): 10398-10416. DOI:10.1007/s10489-021-03073-z.
[10] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 779-788. DOI:10.1109/CVPR.2016.91.
[11] LIU W, ANGELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]//Computer Vision - ECCV. Cham, Switzerland: Springer, 2016: 21-37.
[12] XIA W J, LI P Q, HUANG H Y, et al. TTD-YOLO: A real-time traffic target detection algorithm based on YOLOv5[J]. IEEE Access, 2024, 12: 66419-66431. DOI:10.1109/ACCESS.2024.3394697.
[13] LUO S, YU J, XI Y J, et al. Aircraft target detection in remote sensing images based on improved YOLOv5[J]. IEEE Access, 2022, 10: 5184-5192. DOI:10.1109/ACCESS.2022.3141899.
[14] CHEN H, CHEN Z, YU H. Enhanced YOLOv5: An efficient road object detection method[J]. Sensors, 2023, 23(20): 8355. DOI:10.3390/s23208355.
[15] WANG W J, YU W. Enhancing real-time road object detection: The RD-YOLO algorithm with higher precision and efficiency[J]. IEEE Access, 2024, 12: 190876-190888. DOI:10.1109/ACCESS.2024.3518208.
[16] 王磊, 胡君红, 任洋. 基于大内核自适应融合的小目标检测算法[J]. 计算机工程, 2025, 51(6): 65-73. DOI:10.19678/j.issn.1000-3428.0068540. WANG L, HU J H, REN Y. Small object detection algorithm based on large kernel adaptive fusion[J]. Computer Engineering, 2025, 51(6): 65-73. DOI:10.19678/j.issn.1000-3428.0068540.
[17] 华夏, 王新晴, 王东, 等. 基于改进SSD的交通大场景多目标检测[J]. 光学学报, 2018, 38(12): 213-223. DOI:10.3788/AOS201838.1215003. HUA X, WANG X Q, WANG D, et al. Multi-objective detection of traffic scenes based on improved SSD[J]. Acta Optica Sinica, 2018, 38(12): 213-223. DOI:10.3788/AOS201838.1215003.
[18] 霍爱清, 张书涵, 杨玉艳, 等. 密集交通场景中改进YOLOv3目标检测优化算法[J]. 计算机工程与科学, 2023, 45(5): 878-884. DOI:10.3969/j.issn.1007-130X.2023.05.013. HUO A Q, ZHANG S H, YANG Y Y, et al. An improved YOLOv3 target detection optimization algorithm in dense traffic scenarios[J]. Computer Engineering and Science, 2023, 45(5): 878-884. DOI:10.3969/j.issn.1007-130X.2023.05.013.
[19] 陈海秀, 陈子昂, 房威志, 等. 复杂场景下的改进YOLOv8-n密集行人检测模型[J/OL]. 计算机工程. (2025)[2025-06-16].https://doi.org/10.19678/j.issn.1000-3428.0070531. CHEN H X, CHEN Z A, FANG W Z, et al. An improved dense pedestrian detection algorithm based on YOLOv8-n in complex scenes[J/OL]. Computer Engineering. (2025)[2025-06-16].https://doi.org/10.19678/j.issn.1000-3428.0070531.
[20] LI F, ZHAO Y, WEI J, et al. SNCE-YOLO: An improved target detection algorithm in complex road scenes[J]. IEEE Access, 2024, 12: 152138-152151. DOI:10.1109/ACCESS.2024.3481642.
[21] 高德勇, 陈泰达, 缪兰. 改进YOLOv8n的道路目标检测算法[J]. 计算机工程与应用, 2024, 60(16): 186-197. DOI:10.3778/j.issn.1002-8331.2403-0383. GAO D Y, CHEN T D, MIAO L. Improved road object detection algorithm for YOLOv8n[J]. Computer Engineering and Applications, 2024, 60(16): 186-197. DOI:10.3778/j.issn.1002-8331.2403-0383.
[22] LAU K W, PO L M, REHMAN Y A U. Large separable kernel attention: Rethinking the large kernel attention design in CNN[J]. Expert Systems with Applications, 2024, 236: 121352. DOI:10.1016/j.eswa.2023.121352.
[23] XIA Z F, PAN X R, SONG S J, et al. Vision transformer with deformable attention[EB/OL]. arXiv, 2022. (2022-01-03)[2025-06-16]. https://doi.org/10.48550/arXiv.2201.00520.
[24] GUO M H, LU C Z, LIU Z N, et al. Visual attention network[J]. Computational Visual Media, 2023, 9(4): 733-752. DOI:10.1007/s41095-023-0364-2.
[25] ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11. DOI:10.1016/j.neunet.2017.12.012.
[26] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//31st IEEE/CVF Conference on Computer Visionand Pattern Recognition (CVPR). Salt Lake City: IEEE, 2018: 8759-8768. DOI:10.1109/CVPR.2018.00913.
[27] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 936-944. DOI:10.1109/CVPR.2017.106.
[28] HAN J H, LIANG X W, XU H, et al. SODA10M: A large-scale 2D self/semi-supervised object detection dataset for autonomous driving[EB/OL]. arXiv, 2021. (2021-06-17)[2025-06-16]. https://doi.org/10.48550/arXiv.2106.11118.
[29] YU F, CHEN H F, WANG X, et al. BDD100K: A diverse driving dataset for heterogeneous multitask learning[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE/CVF, 2020: 2633-2642.
[30] 殷智伟, 邵家玉, 张宁, 等. YOLO-DAW: 基于窗口内部双重注意力机制的目标检测模型[J]. 东南大学学报(自然科学版), 2023, 53(4): 718-724. DOI:10.3969/j.issn.1001-0505.2023.04.018. YIN Z W, SHAO J Y, ZHANG N, et al. YOLO-DAW: Object detection model based on dual attention mechanism within windows[J]. Journal of Southeast University (Natural Science Edition), 2023, 53(4): 718-724. DOI:10.3969/j.issn.1001-0505.2023.04.018.
[31] ZHAO L, FU L L, JIA X, et al. YOLO-BOS: An emerging approach for vehicle detection with a novel BRSA mechanism[J]. Sensors, 2024, 24(24): 8126. DOI:10.3390/s24248126.
[32] MAO G T, LIANG H B, YAO Y T, et al. ESPPNet: An efficient progressive spatial pyramid pooling network for real-time traffic object detection[J]. IEEE Transactions on Automation Science and Engineering, 2025, 22: 14048-14061. DOI:10.1109/TASE.2025.3558929.
[33] LYU C Q, ZHANG W W, HUANG H A, et al. RTMDet: An empirical study of designing real-time object detectors[EB/OL]. arXiv, 2022. (2022-12-15)[2025-06-16]. https://doi.org/10.48550/arXiv.2212.07784.

Please choose a citation manager

Content to export