Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (2): 167-176. doi: 10.19678/j.issn.1000-3428.0070134

• Computer Vision and Image Processing • Previous Articles    

Few-shot Image Classification Method Based on Salient Position Interaction Transformer

SONG Chaoqi1, LIU Ying1,2,3, HE Jinglu1,2,3, LI Daxiang1,2,3   

  1. 1. Center for Image and Information Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, Shaanxi, China;
    2. Shaanxi Provincial Forensic Science Electronic Information Experimental Research Center, Xi'an 710121, Shaanxi, China;
    3. Shaanxi Provincial International Joint-Research Center for Wireless Communication and Information Processing, Xi'an 710121, Shaanxi, China
  • Received:2024-07-17 Revised:2024-08-26 Published:2024-11-19

基于显著位置交互Transformer的小样本图像分类方法

宋朝琦1, 刘颖1,2,3, 何敬鲁1,2,3, 李大湘1,2,3   

  1. 1. 西安邮电大学图像与信息处理研究中心, 陕西 西安 710121;
    2. 陕西省法庭科学电子信息实验研究中心, 陕西 西安 710121;
    3. 陕西省无线通信与信息处理技术国际合作研究中心, 陕西 西安 710121
  • 作者简介:宋朝琦,男,硕士研究生,主研方向为小样本学习,E-mail:scqsdjn@163.com;刘颖,教授;何敬鲁,讲师;李大湘,副教授。
  • 基金资助:
    国家自然科学基金(62301427)。

Abstract: Image classification, a fundamental task in computer vision, has achieved remarkable results on large-scale datasets. However, traditional deep learning methods tend to overfit under low sample size conditions, thereby affecting the model's generalization ability. To address this issue, this study presents a novel small sample image classification method to improve classification performance when sample data are scarce. This method is based on the significant position interaction Transformer and the target classifier, specifically leveraging the structure and advantages of the Vision Transformer (ViT) model. The Interaction Multi-Head Self Attention (HI-MHSA) module with significant position selection is introduced, increasing the interaction between each attention head in the multi-head self-attention module, strengthening the model's attention to significant regions in the input image, saving computational resources, and further improving the learning efficiency and accuracy of the model through the supervision and guidance of the target classifier. Experimental results show that on the miniImageNet, tieredImageNet, and CUB datasets, the proposed method achieves classification accuracies of approximately 67.09%, 72.07%, and 79.82% in a 5-way 1-shot task and approximately 83.54%, 85.62%, and 90.35%, in a 5-way 5-shot task, respectively. Therefore, the proposed method can perform well and is highly practical for small sample image classification tasks.

Key words: few-shot image classification, Transformer, attention mechanism, few-shot classifier, salient position selection

摘要: 图像分类作为计算机视觉的基础任务,目前在大规模数据集上的研究已取得显著成效。然而,在低样本量数据条件下,传统的深度学习方法受制于过拟合问题,影响模型的泛化能力。为此,设计一种新颖的小样本图像分类方法,用于提升模型在样本数据稀缺时的分类性能。该方法基于显著位置相互作用Transformer与目标分类器,借鉴ViT(Vision Transformer)模型的结构和优势,引入具有显著位置选择的相互作用多头自注意力(HI-MHSA)模块,同时增加对多头自注意力模块中各个注意力头之间的交互,强化模型对输入图像中显著区域的关注,节省计算资源,并通过目标分类器的监督指导,进一步提升模型的学习效率和准确性。实验结果表明,在miniImageNet、tieredImageNet以及CUB数据集上,该方法在5-way 1-shot任务中分类准确率分别约为67.09%、72.07%和79.82%,在5-way 5-shot任务中分类准确率分别约为83.54%、85.62%和90.35%。实验结果显示,该方法在小样本图像分类任务中具有优秀的性能和高度的实用性。

关键词: 小样本图像分类, Transformer, 注意力机制, 小样本分类器, 显著位置选择

CLC Number: