HU Shihao, JIA Zhiwei, LI Jiajun , SUN Chenhao
Accepted: 2026-06-26
This paper addresses foreign object detection in UAV-based transmission line inspection, where target scales vary greatly, small objects are easily missed, background interference is strong, occlusion is common, and airborne edge devices have limited computing resources. Based on YOLOv8n, a multi-branch lightweight algorithm named MBL-YOLO is proposed for detecting typical foreign objects such as bird nests, kites, waste objects, and balloons. In inspection images, these objects often overlap visually with conductors, insulators, towers, and natural backgrounds. Their boundaries may be unclear, which can lead to localization offsets and insufficient confidence in compact detectors. The goal of the proposed method is to improve representation for multi-scale foreign objects and complex backgrounds without significantly increasing model complexity, while also meeting the real-time, low-power, and lightweight deployment requirements of UAV edge platforms.
In terms of network structure, MBL-YOLO improves YOLOv8n from three aspects: backbone feature extraction, cross-scale feature fusion, and detection-head lightweighting. First, a Mixed Dynamic Fusion Block is embedded into the C2f structure to build C2F-MDFB. Through dynamic kernel weighting and multi-scale depthwise separable convolution branches, this module adaptively adjusts the contributions of features with different receptive fields. It enables the network to simultaneously focus on large objects such as bird nests, slender objects such as kite strings, small objects such as balloons, and local features of irregular waste objects. Residual connections, channel mixing, and normalization further enhance the interaction between low-level details and high-level semantics, reducing missed detections caused by blurred object boundaries or partial occlusion. Second, a weighted bidirectional feature pyramid, BI-FPN, is introduced into the neck. Learnable weights are used to fuse hierarchical features such as P3, P4, and P5. Shallow features retain edge texture and location information, while deeper semantic constraints suppress background noise from conductors, towers, vegetation, and sky regions, improving small-object localization and class discrimination under complex backgrounds. Finally, to address repeated parameters and redundant inference in the original multi-scale independent detection branches, a shared convolutional detection head named Detect-LSCD is designed. It replaces repeated convolutions with two shared convolutional layers and uses GroupNorm to stabilize feature distributions under small-batch inference, reducing parameter scale and computational cost while maintaining multi-scale detection capability.
Experiments are conducted on a self-built transmission line foreign object dataset. The dataset contains 4,200 images and 8,207 fully annotated targets, covering mountain and urban backgrounds. It includes 2,103 bird nests, 2,560 kites, 1,648 waste objects, and 1,896 balloons, with 2,940 training images, 840 validation images, and 420 test images. Under the same training and testing conditions, MBL-YOLO achieves 97.5% Precision, 97.1% Recall, 97.3% mAP50, and 70.4% mAP50-95, with 2.08M parameters, 5.8 GFLOPs, and an inference speed of 175.6 FPS. Compared with YOLOv8n, mAP50-95 increases by 1.9 percentage points, the parameter size and computational cost decrease by about 30.9% and 29.3%, respectively, and FPS increases from 168.6 to 175.6. Compared with YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, Gold-YOLO, YOLO-world, D-Fine-N, and DEIM-D-Fine-N, MBL-YOLO obtains the highest mAP50-95 while maintaining the lowest computational cost and a relatively small parameter size, indicating that the improvement does not rely on model stacking. Ablation experiments show that C2F-MDFB, BI-FPN, and Detect-LSCD improve dynamic representation, weighted fusion, and lightweight prediction, respectively, and provide complementary benefits when combined. To verify cross-scene generalization, experiments are also conducted on the VisDrone2019 public dataset. MBL-YOLO achieves 41.7% Precision, 31.1% Recall, 31.2% mAP50, and 18.6% mAP50-95, improving YOLOv8n by 4.3, 5.5, 4.3, and 2.1 percentage points, respectively. Visual results show that the model reduces missed detections, false detections, and redundant boxes in small-object, multi-class, dense-background, and occlusion scenes, demonstrating good feature retention and scene transfer capability.
For engineering feasibility, MBL-YOLO is deployed on a UAV platform equipped with NVIDIA Jetson TX2 and a ZED2 stereo camera. The model is exported to ONNX and then optimized with TensorRT 8.2 through FP16 half-precision quantization and operator fusion to generate a native TX2 inference engine. With a 640×640 input resolution, batch size of 1, and TX2 Max-P mode, the end-to-end latency of MBL-YOLO is 38.5 ms per frame, and the overall frame rate remains stable at 26 FPS. Under the same conditions, YOLOv8n reaches 55.1 ms and 18 FPS, so the practical deployment speed is improved by about 44.4%. The average power consumption of TX2 and ZED2 working together is about 9.2 W, less than 3% of the UAV flight power consumption. Overall, MBL-YOLO improves detection accuracy, recall, and real-time performance while reducing parameters and computation. It is suitable for UAV edge inspection platforms and can provide an engineering-practical solution for automatic foreign object recognition, abnormal-object warning, and intelligent operation and maintenance of transmission lines, while also laying a foundation for future multimodal perception and online inspection system integration.