HUANG Yuqi, YANG Xiaoxia, YANG Ronghao , LIAO Fangzhou, YAN Le, GUO Junqiang, LI Minghan
Accepted: 2025-10-30
Object detection for autonomous driving perception aims to locate and identify traffic participants such as motor vehicles, non-motor vehicles, and pedestrians within onboard camera views in real time, providing accurate input for the environmental perception module to support decision-making and control in autonomous driving systems. The perception system suffers from false and missed detection rates due to complex road backgrounds, diverse object shapes, and large scale variations. Specific challenges include low accuracy in detecting deformed objects, insufficient multi-scale detection, and weak global perception. To address these issues, an improved algorithm named YOLOv8-DDL based on YOLOv8n is proposed. First, deformable attention is introduced to improve the C2f module in the backbone network, which dynamically learns feature offsets to enhance the capture capability for various object shapes in traffic scenes, improving the model's adaptability to complex spatial distributions and effectively reducing false detections. Second, large separable kernel attention is integrated to enhance the spatial pyramid pooling fast module, expanding the receptive field through large-kernel convolution to strengthen global context modeling and robustness in complex backgrounds. Finally, a dynamic multi-scale adaptive fusion module and a dynamic feature pyramid network are designed to reconstruct the neck network, dynamically fusing high-level and low-level features to enhance multi-scale feature representation and improve multi-scale object detection performance. Experimental results on the public SODA10M dataset show that compared to YOLOv8n, YOLOv8-DDL improves precision, recall, F1-score, and mean average precision by 5.9%, 1.3%, 3%, and 1.5%, respectively. Additional validation on the public BDD100K dataset confirms improvements of 2%, 0.6%, 1%, and 2% in these metrics, respectively.