Tang Weibo , Li Peigen, Fang Qiang, Ai Longjin, Xiong Jinhong, Xia Haiting
Accepted: 2024-12-05
The RSD-YOLO algorithm, based on YOLOv8s, has been proposed to address the challenges of low detection performance, severe occlusion, the difficulty of small target feature extraction, and the large number of model parameters inherent in UAV aerial images. Firstly, the Receptive Field Attention (CSP-RFA) module is designed to replace the C2f module to enhance the capability for small target feature extraction, effectively addressing the insensitivity of traditional convolutional operations to positional changes. Secondly, the backbone network and feature fusion network have been made lightweight, a new large-size feature map detection branch has been added, and a Receptive Field Pyramid Network (RFPN) has been proposed to optimize the feature flow direction and improve feature representation. Additionally, the detection head module has been optimized by integrating multi-scale features with a multi-level attention mechanism, and the loss function has been updated to improve the model's detection performance for small targets. In terms of model compression, LAMP pruning is employed to further reduce the number of parameters and the model size. Experimental results demonstrate that the lightweight RSD-YOLO model significantly outperforms the baseline model on the publicly available VisDrone2019 dataset, with a 10.0% increase in precision, a 9.5% increase in mAP0.5 (equivalent to a 24.1% increase), and a 6.9% increase in mAP0.5:0.95 (equivalent to a 29.4% increase). The number of model parameters was reduced from 11.12 million to 4.05 million, representing a 63.6% reduction, and the computational cost was reduced from 42.7 GFLOPs to 25.5 GFLOPs, a 40% reduction. Furthermore, on a newly filtered dataset focusing on small occluded targets, RSD-YOLO showed improvements of 9.1%, 16.1%, and 10.7% on precision, mAP0.5, and mAP0.5:0.95, respectively.