作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

轴向注意力与尺度感知的小样本细粒度图像分类

  • 出版日期:2025-06-20 发布日期:2025-06-20

Axial Attention and Scale-Awareness for Few-Shot Fine-Grained Image Classification

  • Online:2025-06-20 Published:2025-06-20

摘要: 细粒度图像分类任务中,充足的样本能够提供丰富的局部特征信息。然而,在小样本场景下,数据稀疏性导致模型难以充分捕捉具有判别性的局部信息。为解决这个问题,提出了一种融合轴向注意力与尺度感知机制的小样本学习方法。首先,设计了频率自适应特征选择模块,旨在减少背景噪声和非目标区域的干扰,突出判别性局部特征,从而扩大不同类别间的特征区分度。其次,构建轴向尺度联合增强模块,融合全局上下文信息,关注关键区域,并行处理不同感受野的特征,增强对不同尺度细节的表征能力。最后,采用双相似度量模块,通过两种相似度度量方式指导学习,提升特征的泛化性,减少特定特征的偏向性。在公开数据集CUB_200_2011和Stanford Dogs上,该方法在1-shot和5-shot场景下的分类准确率分别提升了1.4、1.45个百分点和1.86、3.49个百分点。在Stanford Cars数据集上,1-shot场景下达到最优性能,5-shot场景下也取得了竞争力的结果。实验结果表明,该方法有效提升了小样本细粒度图像分类的性能,更好地捕捉了判别性特征信息。

Abstract: In fine-grained image classification tasks, sufficient samples can provide rich local feature information. However, in few-shot scenarios, data sparsity makes it difficult for the model to fully capture discriminative local information. To address this issue, a few-shot learning method integrating axial attention and a scale-aware mechanism is proposed. First, a frequency-adaptive feature selection module is designed to reduce interference from background noise and non-target regions, highlighting discriminative local features and thus increasing the feature separability between different categories. Second, an axial-scale joint enhancement module is constructed to integrate global contextual information, focus on key regions, and process features with different receptive fields in parallel, improving the representation capability for details at various scales. Finally, a dual similarity measurement module is adopted to guide learning through two similarity measurement methods, enhancing the generalization of features and reducing the bias toward specific features. On the public datasets CUB_200_2011 and Stanford Dogs, the proposed method improves classification accuracy by 1.4 and 1.45 percentage points in the 1-shot and 5-shot scenarios, respectively, and by 1.86 and 3.49 percentage points on the Stanford Cars dataset. In the 1-shot scenario, it achieves state-of-the-art performance, while in the 5-shot scenario, it also achieves competitive results. Experimental results demonstrate that the proposed method effectively improves the performance of fine-grained image classification under few-shot settings and better captures discriminative feature information.