Abstract:
This paper addresses foreign object detection in UAV-based transmission line inspection, where target scales vary greatly, small objects are easily missed, background interference is strong, occlusion is common, and airborne edge devices have limited computing resources. Based on YOLOv8n, a multi-branch lightweight algorithm named MBL-YOLO is proposed for detecting typical foreign objects such as bird nests, kites, waste objects, and balloons. In inspection images, these objects often overlap visually with conductors, insulators, towers, and natural backgrounds. Their boundaries may be unclear, which can lead to localization offsets and insufficient confidence in compact detectors. The goal of the proposed method is to improve representation for multi-scale foreign objects and complex backgrounds without significantly increasing model complexity, while also meeting the real-time, low-power, and lightweight deployment requirements of UAV edge platforms.
In terms of network structure, MBL-YOLO improves YOLOv8n from three aspects: backbone feature extraction, cross-scale feature fusion, and detection-head lightweighting. First, a Mixed Dynamic Fusion Block is embedded into the C2f structure to build C2F-MDFB. Through dynamic kernel weighting and multi-scale depthwise separable convolution branches, this module adaptively adjusts the contributions of features with different receptive fields. It enables the network to simultaneously focus on large objects such as bird nests, slender objects such as kite strings, small objects such as balloons, and local features of irregular waste objects. Residual connections, channel mixing, and normalization further enhance the interaction between low-level details and high-level semantics, reducing missed detections caused by blurred object boundaries or partial occlusion. Second, a weighted bidirectional feature pyramid, BI-FPN, is introduced into the neck. Learnable weights are used to fuse hierarchical features such as P3, P4, and P5. Shallow features retain edge texture and location information, while deeper semantic constraints suppress background noise from conductors, towers, vegetation, and sky regions, improving small-object localization and class discrimination under complex backgrounds. Finally, to address repeated parameters and redundant inference in the original multi-scale independent detection branches, a shared convolutional detection head named Detect-LSCD is designed. It replaces repeated convolutions with two shared convolutional layers and uses GroupNorm to stabilize feature distributions under small-batch inference, reducing parameter scale and computational cost while maintaining multi-scale detection capability.
Experiments are conducted on a self-built transmission line foreign object dataset. The dataset contains 4,200 images and 8,207 fully annotated targets, covering mountain and urban backgrounds. It includes 2,103 bird nests, 2,560 kites, 1,648 waste objects, and 1,896 balloons, with 2,940 training images, 840 validation images, and 420 test images. Under the same training and testing conditions, MBL-YOLO achieves 97.5% Precision, 97.1% Recall, 97.3% mAP50, and 70.4% mAP50-95, with 2.08M parameters, 5.8 GFLOPs, and an inference speed of 175.6 FPS. Compared with YOLOv8n, mAP50-95 increases by 1.9 percentage points, the parameter size and computational cost decrease by about 30.9% and 29.3%, respectively, and FPS increases from 168.6 to 175.6. Compared with YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, Gold-YOLO, YOLO-world, D-Fine-N, and DEIM-D-Fine-N, MBL-YOLO obtains the highest mAP50-95 while maintaining the lowest computational cost and a relatively small parameter size, indicating that the improvement does not rely on model stacking. Ablation experiments show that C2F-MDFB, BI-FPN, and Detect-LSCD improve dynamic representation, weighted fusion, and lightweight prediction, respectively, and provide complementary benefits when combined. To verify cross-scene generalization, experiments are also conducted on the VisDrone2019 public dataset. MBL-YOLO achieves 41.7% Precision, 31.1% Recall, 31.2% mAP50, and 18.6% mAP50-95, improving YOLOv8n by 4.3, 5.5, 4.3, and 2.1 percentage points, respectively. Visual results show that the model reduces missed detections, false detections, and redundant boxes in small-object, multi-class, dense-background, and occlusion scenes, demonstrating good feature retention and scene transfer capability.
For engineering feasibility, MBL-YOLO is deployed on a UAV platform equipped with NVIDIA Jetson TX2 and a ZED2 stereo camera. The model is exported to ONNX and then optimized with TensorRT 8.2 through FP16 half-precision quantization and operator fusion to generate a native TX2 inference engine. With a 640×640 input resolution, batch size of 1, and TX2 Max-P mode, the end-to-end latency of MBL-YOLO is 38.5 ms per frame, and the overall frame rate remains stable at 26 FPS. Under the same conditions, YOLOv8n reaches 55.1 ms and 18 FPS, so the practical deployment speed is improved by about 44.4%. The average power consumption of TX2 and ZED2 working together is about 9.2 W, less than 3% of the UAV flight power consumption. Overall, MBL-YOLO improves detection accuracy, recall, and real-time performance while reducing parameters and computation. It is suitable for UAV edge inspection platforms and can provide an engineering-practical solution for automatic foreign object recognition, abnormal-object warning, and intelligent operation and maintenance of transmission lines, while also laying a foundation for future multimodal perception and online inspection system integration.
摘要: 针对输电线路无人机巡检中异物目标尺度差异大、小目标易漏检、背景干扰强、遮挡普遍以及机载边缘设备算力受限等问题,本文以YOLOv8n为基线,提出面向输电线路异物检测的多分支轻量化算法MBL-YOLO,用于鸟巢、风筝、垃圾和气球等典型异物识别。巡检图像中异物常与导线、绝缘子、杆塔和自然背景相互交叠,目标边界不清,容易导致轻量模型出现定位偏移和置信度不足。该方法的目标是在不显著增加模型复杂度的前提下,提高网络对多尺度异物和复杂背景的表征能力,并满足无人机边缘平台对实时性、低功耗和轻量化的部署需求。
在网络结构上,MBL-YOLO从主干特征提取、跨尺度特征融合和检测头轻量化三个层面改进YOLOv8n。首先,在C2f结构中嵌入混合动态融合模块,构建C2F-MDFB。该模块通过动态核权重机制和多尺度深度可分离卷积分支,自适应调整不同感受野特征的贡献,使网络能够同时关注鸟巢等较大目标、风筝线等细长目标、气球等小尺度目标以及形态不规则垃圾的局部特征;结合残差连接、通道混合和归一化操作,进一步增强低层细节与高层语义之间的信息交互,降低异物边缘模糊或局部遮挡造成的漏检风险。其次,在Neck部分引入加权双向特征金字塔BI-FPN,通过可学习权重融合P3、P4、P5等层级特征,保留浅层特征中的边缘纹理和位置信息,利用深层语义约束抑制导线、杆塔、植被和天空等背景噪声,从而提升小目标定位和复杂背景下的类别判别能力。最后,针对原检测头多尺度独立分支带来的参数重复和推理冗余,设计Detect-LSCD共享卷积检测头,以两层共享卷积替代重复卷积,并采用GroupNorm稳定小批量推理条件下的特征分布,在保持多尺度检测能力的同时降低参数规模和计算开销。
实验基于自建输电线路异物数据集开展。该数据集包含4200张图像和8207个完整标注目标,覆盖山区和城市两类背景,其中鸟巢2103个、风筝2560个、垃圾1648个、气球1896个,训练集、验证集和测试集分别为2940张、840张和420张。在统一训练和测试条件下,MBL-YOLO取得97.5%的Precision、97.1%的Recall、97.3%的mAP50和70.4%的mAP50-95,参数量为2.08M,计算量为5.8 GFLOPs,推理速度达到175.6 FPS。与YOLOv8n相比,mAP50-95提高1.9个百分点,参数量和计算量分别降低约30.9%和29.3%,FPS由168.6提升至175.6。与YOLOv9t、YOLOv10n、YOLOv11n、YOLOv12n、Gold-YOLO、YOLO-world、D-Fine-N和DEIM-D-Fine-N相比,MBL-YOLO在最低计算量和较低参数规模下取得最高mAP50-95,说明其性能提升并非依赖模型堆叠。消融实验表明,C2F-MDFB、BI-FPN和Detect-LSCD分别改善动态表征、加权融合和轻量预测,三者联合具有互补作用。为验证跨场景泛化能力,本文在VisDrone2019公开数据集上测试,MBL-YOLO的Precision、Recall、mAP50和mAP50-95分别为41.7%、31.1%、31.2%和18.6%,较YOLOv8n分别提升4.3、5.5、4.3和2.1个百分点。可视化结果显示,模型在小尺度异物、多类别异物、密集背景和遮挡场景中能够减少漏检、误检和冗余框,体现出较好的特征保持与场景迁移能力。
为验证工程可用性,本文将MBL-YOLO部署到搭载NVIDIA Jetson TX2和ZED2双目相机的无人机平台。模型经ONNX导出后,基于TensorRT 8.2进行FP16半精度量化与算子融合,生成TX2原生推理引擎。在输入分辨率640×640、批大小为1和TX2 Max-P模式下,MBL-YOLO端到端单帧延迟为38.5 ms,整体帧率稳定在26 FPS;同等条件下YOLOv8n为55.1 ms和18 FPS,实际部署速度提升约44.4%。TX2与ZED2协同工作时平均功耗约9.2 W,占无人机飞行功耗不足3%。综合结果表明,MBL-YOLO能够在降低参数量和计算量的同时提升检测精度、召回能力和实时性能,适合部署于无人机边缘巡检平台,可为输电线路异物自动识别、异常告警和智能运维提供兼顾准确性、效率与工程落地性的检测方案,也为后续多模态感知和在线巡检系统集成奠定基础。