Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A Power Inspection Image Retrieval Method Based on Frequency Domain Collaboration and Multi-Scale Gaing

  

  • Published:2026-06-03

基于频域协同与多尺度门控的电力巡检图像检索方法

Abstract:

Unmanned aerial vehicle (UAV) power inspection images often contain cluttered backgrounds and variable target scales. These factors limit the image retrieval accuracy. To solve these problems, this paper proposes a power image retrieval network named Swin-FMG. The network is based on frequency-domain coordinate synergy and multi-scale gating. The method uses Swin Transformer as the backbone architecture. First, it proposes a Frequency-domain Coordinate Collaborative Attention (FCCA) mechanism. FCCA combines global spectrum filtering and orthogonal space projection. It effectively suppresses environmental noise and restores the physical continuity of target geometric features. Second, the method designs a Semantic-Guided Multi-Scale Convolutional Gated Fusion (MSCGF) module. MSCGF uses deep semantics to adaptively filter shallow multi-scale textures. It also constructs a dual-stream retrieval representation. This module greatly enhances the perception ability of the model to cope with cross-view scale changes. Finally, the method introduces Low-Rank Adaptation (LoRA) fine-tuning and a joint loss function with hard-sample triplets. These strategies mitigate the overfitting risk on small samples. They also optimize the inter-class separability of the feature metric space. The method is evaluated on a self-built power inspection image retrieval dataset. Experimental results show that the mean Average Precision (mAP) of Swin-FMG reaches 63.15%. The Recall@1 reaches 71.04%. Compared with the baseline Swin Transformer, the mAP of Swin-FMG increases by 4.19%. In conclusion, Swin-FMG effectively strips complex environmental interference and captures scale-invariant features. It significantly improves the image retrieval performance of power equipment while maintaining computational efficiency. The experimental results verify the effectiveness of the proposed method.

摘要: 针对无人机电力巡检图像背景杂乱及目标尺度多变导致检索精度受限的问题,提出一种基于频域坐标协同与多尺度门控的电力图像检索网络Swin-FMG。该方法以Swin Transformer为骨干架构,首先,提出频域坐标协同注意力机制(FCCA),通过结合全局频谱滤波与正交空间投影有效抑制环境噪声并恢复目标几何特征的物理连续性。其次,设计语义引导的多尺度门控融合模块(MSCGF),利用深层语义自适应筛选浅层多尺度纹理并构建双流检索表征,增强模型应对跨视角尺度变化的感知能力。最后,引入低秩适配微调与难样本三元组联合损失函数,在缓解小样本过拟合风险的同时进一步优化特征度量空间的类间可分性。在自建电力巡检图像检索数据集上的实验结果表明,Swin-FMG的平均精度均值达到63.15%,首位召回率达到71.04%。与基准Swin Transformer相比,其平均精度均值提升了4.19%。实验结果表明,Swin-FMG能有效剥离复杂环境干扰并捕获尺度不变特征,在兼顾计算效率的前提下显著提升了电力设备的图像检索性能,验证了所提方法的有效性。