Computer Vision and Image Processing
TANG Weibo, FANG Qiang, LI Peigen, AI Longjin, XIONG Jinhong, XIA Haiting
The RSD-YOLO algorithm, based on YOLOv8s, is proposed to address the challenges of low detection performance, severe occlusion, difficulty of small target feature extraction, and large number of model parameters inherent in Unmanned Aerial Vehicle (UAV) aerial images. First, the Receptive Field Attention (RFA) module CSP-RFA is designed to replace the C2f module for enhancing the capability of small target feature extraction, effectively addressing the insensitivity of traditional convolutional operations to positional changes. Second, the backbone and feature fusion networks are made lightweight, a new large-size feature map detection branch is added, and a Receptive Field Pyramid Network (RFPN) is proposed to optimize the feature flow direction and improve feature representation. Third, the detection head module is optimized by integrating multi-scale features with a multi-level attention mechanism and the loss function is updated to improve the model’s detection performance for small targets. Finally, in terms of model compression, Layer-Adaptive Magnitude-based Pruning (LAMP) algorithm is employed to further reduce the number of parameters and model size. The experimental results demonstrate that the lightweight RSD-YOLO model significantly outperforms the baseline model on the publicly available VisDrone2019 dataset, with a 10.0 percentage point increase in precision, a 9.5 percentage point increase in mAP@0.5 (equivalent to a 24.1% increase), and a 6.9 percentage point increase in mAP@0.5∶0.95 (equivalent to a 29.4% increase). The number of model parameters is reduced from 11.12×106 to 4.05×106, representing a 63.6% reduction, and the computational cost is reduced from 42.7 GFLOPs to 25.5 GFLOPs, showing a 40% reduction. Furthermore, for a newly filtered dataset focusing on small occluded targets, RSD-YOLO shows improvements of 9.1, 16.1, and 10.7 percentage points in terms of precision, mAP@0.5, and mAP@0.5∶0.95, respectively.