A Power Inspection Image Retrieval Method Based on Frequency Domain Collaboration and Multi-Scale Gaing

doi:10.19678/j.issn.1000-3428.0260250

Abstract

Abstract:

Unmanned aerial vehicle (UAV) power inspection images often contain cluttered backgrounds and variable target scales. These factors limit the image retrieval accuracy. To solve these problems, this paper proposes a power image retrieval network named Swin-FMG. The network is based on frequency-domain coordinate synergy and multi-scale gating. The method uses Swin Transformer as the backbone architecture. First, it proposes a Frequency-domain Coordinate Collaborative Attention (FCCA) mechanism. FCCA combines global spectrum filtering and orthogonal space projection. It effectively suppresses environmental noise and restores the physical continuity of target geometric features. Second, the method designs a Semantic-Guided Multi-Scale Convolutional Gated Fusion (MSCGF) module. MSCGF uses deep semantics to adaptively filter shallow multi-scale textures. It also constructs a dual-stream retrieval representation. This module greatly enhances the perception ability of the model to cope with cross-view scale changes. Finally, the method introduces Low-Rank Adaptation (LoRA) fine-tuning and a joint loss function with hard-sample triplets. These strategies mitigate the overfitting risk on small samples. They also optimize the inter-class separability of the feature metric space. The method is evaluated on a self-built power inspection image retrieval dataset. Experimental results show that the mean Average Precision (mAP) of Swin-FMG reaches 63.15%. The Recall@1 reaches 71.04%. Compared with the baseline Swin Transformer, the mAP of Swin-FMG increases by 4.19%. In conclusion, Swin-FMG effectively strips complex environmental interference and captures scale-invariant features. It significantly improves the image retrieval performance of power equipment while maintaining computational efficiency. The experimental results verify the effectiveness of the proposed method.

摘要： 针对无人机电力巡检图像背景杂乱及目标尺度多变导致检索精度受限的问题，提出一种基于频域坐标协同与多尺度门控的电力图像检索网络Swin-FMG。该方法以Swin Transformer为骨干架构，首先，提出频域坐标协同注意力机制(FCCA)，通过结合全局频谱滤波与正交空间投影有效抑制环境噪声并恢复目标几何特征的物理连续性。其次，设计语义引导的多尺度门控融合模块(MSCGF)，利用深层语义自适应筛选浅层多尺度纹理并构建双流检索表征，增强模型应对跨视角尺度变化的感知能力。最后，引入低秩适配微调与难样本三元组联合损失函数，在缓解小样本过拟合风险的同时进一步优化特征度量空间的类间可分性。在自建电力巡检图像检索数据集上的实验结果表明，Swin-FMG的平均精度均值达到63.15%，首位召回率达到71.04%。与基准Swin Transformer相比，其平均精度均值提升了4.19%。实验结果表明，Swin-FMG能有效剥离复杂环境干扰并捕获尺度不变特征，在兼顾计算效率的前提下显著提升了电力设备的图像检索性能，验证了所提方法的有效性。

Wang Lihui, Li Yuan, Liu Zefeng, Wei Yachuan. A Power Inspection Image Retrieval Method Based on Frequency Domain Collaboration and Multi-Scale Gaing[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260250.

王立辉, 李元 , 刘泽峰 , 魏雅川. 基于频域协同与多尺度门控的电力巡检图像检索方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260250.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260250

References

[1] Song, Zhiwei, et al. Deformable YOLOX: Detection and rust warning method of transmission line connection fittings based on image processing technology[J]. IEEE Transactions on Instrumentation and Measurement 72 (2023): 1-21.
[2] W. Guo et al., AI-oriented smart power system transient stability: The rationality, applications, challenges and future opportunities[J]. Sustain. Energy Technol. Assess., vol. 56, Art. no. 102990, 2023.
[3] X. Liu, J. Du, X. Gao, et al. Electrical equipment classification via improved Faster Region-based Convolutional Neural Network[C]//Proceedings of the 36th Chin. Control Decision. Conf. (CCDC), Xi'an, China, 2024, pp. 5956–5961.
[4] 白翔,李巨川,王慧民,等.基于改进Swin Transformer的电力图像检索方法[J/OL].计算机应用,1-13[2025-11-21]. Bai Xiang, Li Juchuan, Wang Huimin, et al. Power image retrieval method based on improved Swin Transformer [J/OL]. Journal of Computer Applications, 1–13 [2025-11-21].
[5] Liu, Yue, and Xinbo Huang. Efficient cross-modality insulator augmentation for multi-domain insulator defect detection in UAV images[J]. Sensors 24.2 (2024): 428.
[6] Aitelhaj, Rita, Badr-Eddine Benelmostafa, and Hicham Medromi. APF-YOLOV8: Enhancing Multiscale Detection and Intra-Class Variance Handling for UAV-Based Insulator Power Line Inspections[J]. F1000Research 14 (2025): 141.
[7] 刘传洋,吴一全,刘景景.无人机航拍图像中绝缘子缺陷检测的深度学习方法研究进展[J].电工技术学报, 2025, 40(9):2897-2916 Liu Chuanyang, Wu Yiquan, Liu Jingjing. Research progress on deep learning methods for insulator defect detection in UAV aerial images[J]. Transactions of China Electrotechnical Society, 2025, 40(9):2897-2916.
[8] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vi-sion transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
[9] Hu E J, Shen Y, Wallis P, et al. Lora: Low-rank adaptation of large language models[J/OL]. arXiv preprint arXiv:2106.09685, 2021.
[10] Lowe, David G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision 60.2 (2004): 91-110.
[11] Bay, Herbert, et al. Speeded-up robust features (SURF)[J]. Computer vision and image understanding 110.3 (2008): 346-359.
[12] Zhang, Yingnan, Zhizhong Kang, and Zhen Cao. An Image Retrieval Method for Lunar Complex Craters Integrating Vis-ual and Depth Features[J]. Electronics 13.7 (2024): 1262.
[13] Babenko, Artem, et al. Neural codes for image retrieval[C]. European conference on computer vision. Cham: Springer International Publishing, 2014: 584-599.
[14] RADENOVIĆ F, TOLIAS G, CHUM O. Fine-tuning CNN image retrieval with no human annotation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018, 41(7): 1655-1668.
[15] Bhatnagar, Shubhang, and Narendra Ahuja. Potential Field Based Deep Metric Learning[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 25549-25559.
[16] Jiang, Xin, et al. Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval[J/OL].arXiv preprint, arXiv:2504.16691, 2025.
[17] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations (ICLR). 2021.
[18] 杨军,张金影,康玥.基于自注意力机制的高分遥感影像语义分割[J].哈尔滨工程大学学报,2025,46(02):344-354. Yang Jun, Zhang Jinying, Kang Yue. Semantic segmentation of high-resolution remote sensing images based on self-attention mechanism [J]. Journal of Harbin Engineering University, 2025, 46(02): 344–354.
[19] Kumar A, Yadav S P, Kumar A. An improved feature extrac-tion algorithm for robust Swin Transformer model in high-dimensional medical image analysis[J]. Computers in bi-ology and medicine, 2025, 188: 109822.
[20] Duan, Yingtao, et al. STMSF: Swin transformer with multi-scale fusion for remote sensing scene classification[J]. Remote Sensing 17.4 (2025): 668.
[21] Yoo, Dayeon, Jeesu Kim, and Jinwoo Yoo. FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification[J]. IEEE Access 12 (2024): 72598-72606.
[22] Qin, Haolin, et al. Factorization vision transformer: Modeling long-range dependency with local window cost[J]. IEEE Transactions on Neural Networks and Learning Systems (2023).
[23] Hou, Qibin, Daquan Zhou, and Jiashi Feng. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722.
[24] Liu Z, Zhu J, Huang G. Collaborative Low-Rank Adaptation for Pre-Trained Vision Transformers [J/OL]. arXiv preprint, arXiv:2512.24603, 2025.
[25] Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE conference on computer vision and pattern recog-nition. 2015: 815-823.
[26] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. International journal of com-puter vision, 2015, 115: 211-252.
[27] Oh Song H, Xiang Y, Jegelka S, et al. Deep metric learning via lifted structured feature embedding[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4004-4012.
[28] He, Kaiming, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[29] Huang, Gao, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
[30] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.
[31] Li, Qiang, et al. Research on image classification of power inspection using less sample learning technique[J]. International Journal of Low-Carbon Technologies 19 (2024): 2119-2126.
[32] Li, Xun, et al. TLINet: A defects detection method for insulators of overhead transmission lines using partially transformer block[J]. PloS One 20.6 (2025): e0327139.
[33] E. Ramzi, et al., Hierarchical average precision training for pertinent image retrieval[C]//European Conference on Com-puter Vision, Cham: Springer Nature Switzerland, 2022: 250-266.
[34] H. Xuan, A. Stylianou, X. Liu, and R. Pless,"Hard negative examples are hard, but useful[C]//European Conference on Computer Vision, Cham: Springer International Publishing, Aug. 2020, pp. 126–142.
[35] B. Cai, P. Xiong, and S. Tian, Center contrastive loss for metric learning[J/OL] arXiv preprint arXiv:2308.00458, 2023.
[36] Furusawa T. Mean field theory in deep metric learning[C]//International Conference on Learning Representations (ICLR), 2024.

Please choose a citation manager

Content to export