轴向注意力与尺度感知的小样本细粒度图像分类

doi:10.19678/j.issn.1000-3428.0252234

摘要/Abstract

摘要： 细粒度图像分类任务中，充足的样本能够提供丰富的局部特征信息。然而，在小样本场景下，数据稀疏性导致模型难以充分捕捉具有判别性的局部信息。为解决这个问题，提出了一种融合轴向注意力与尺度感知机制的小样本学习方法。首先，设计了频率自适应特征选择模块，旨在减少背景噪声和非目标区域的干扰，突出判别性局部特征，从而扩大不同类别间的特征区分度。其次，构建轴向尺度联合增强模块，融合全局上下文信息,关注关键区域，并行处理不同感受野的特征，增强对不同尺度细节的表征能力。最后，采用双相似度量模块，通过两种相似度度量方式指导学习，提升特征的泛化性，减少特定特征的偏向性。在公开数据集CUB_200_2011和Stanford Dogs上，该方法在1-shot和5-shot场景下的分类准确率分别提升了1.4、1.45个百分点和1.86、3.49个百分点。在Stanford Cars数据集上，1-shot场景下达到最优性能，5-shot场景下也取得了竞争力的结果。实验结果表明，该方法有效提升了小样本细粒度图像分类的性能，更好地捕捉了判别性特征信息。

Abstract: In fine-grained image classification tasks, sufficient samples can provide rich local feature information. However, in few-shot scenarios, data sparsity makes it difficult for the model to fully capture discriminative local information. To address this issue, a few-shot learning method integrating axial attention and a scale-aware mechanism is proposed. First, a frequency-adaptive feature selection module is designed to reduce interference from background noise and non-target regions, highlighting discriminative local features and thus increasing the feature separability between different categories. Second, an axial-scale joint enhancement module is constructed to integrate global contextual information, focus on key regions, and process features with different receptive fields in parallel, improving the representation capability for details at various scales. Finally, a dual similarity measurement module is adopted to guide learning through two similarity measurement methods, enhancing the generalization of features and reducing the bias toward specific features. On the public datasets CUB_200_2011 and Stanford Dogs, the proposed method improves classification accuracy by 1.4 and 1.45 percentage points in the 1-shot and 5-shot scenarios, respectively, and by 1.86 and 3.49 percentage points on the Stanford Cars dataset. In the 1-shot scenario, it achieves state-of-the-art performance, while in the 5-shot scenario, it also achieves competitive results. Experimental results demonstrate that the proposed method effectively improves the performance of fine-grained image classification under few-shot settings and better captures discriminative feature information.

高雄, 苟光磊, 周琳杰, 贾朋昊. 轴向注意力与尺度感知的小样本细粒度图像分类[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252234.

Gao Xiong, Gou Guanglei, Zhou Linjie, Jia Penghao. Axial Attention and Scale-Awareness for Few-Shot Fine-Grained Image Classification[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252234.

参考文献

[1] Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for fine-grained visual recognition[C]//Proce-edings of the IEEE international conference on co-mputer vision. 2015: 1449-1457. [2] Fu J, Zheng H, Mei T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of -the IEEE conference on computer vision and patte-rn recognition. 2017: 4438-4446. [3] Zhang N, Donahue J, Girshick R, et al. Part-basedR-CNNs for fine-grained category detection[C]//Co-mputer Vision–ECCV 2014: 13th European Confere-nce, Zurich, Switzerland, September 6-12, 2014, Pr-oceedings, Part I 13. Springer International Publish-ing, 2014: 834-849. [4] Wei X S, Wang P, Liu L, et al. Piecewise classify-er mappings: Learning fine-grained learners for no-vel categories with few examples[J]. IEEE Transact-ions on Image Processing, 2019, 28(12): 6116-6125. [5] Tang H, Yuan C, Li Z, et al. Learning attention-g-uided pyramidal features for few-shot fine-grained -recognition[J]. Pattern Recognition, 2022, 130: 108792. [6] Woo S, Park J, Lee J Y, et al. Cbam: Convolutio-nal block attention module[C]//Proceedings of the -European conference on computer vision (ECCV). -2018: 3-19. [7] Feng H, Wang S, Ge S S. Fine-grained visual rec-ognition with salient feature detection[J]. arXiv pre-print arXiv:1808.03935, 2018. [8] Shih K J, Mallya A, Singh S, et al. Part localizat-ion using multi-proposal consensus for fine-grained-categorization[J]. arXiv preprint arXiv:1507.06332,2-015. [9] Zhu Y, Liu C, Jiang S. Multi-attention Meta Learn-ing for Few-shot Fine-grained Image Recognition[C]//IJCAI. 2020: 1090-1096. [10] Liu C, Xie H, Zha Z J, et al. Filtration and distil-lation: Enhancing region attention for fine-grained -visual categorization[C]//Proceedings of the AAAI -conference on artificial intelligence. 2020, 34(07): -11555-11562. [11] Wang C, Fu H, Ma H. Learning mutually exclusive-e part representations for fine-grained image classi-fication[J]. IEEE Transactions on Multimedia, 2023. [12] 白尚旺, 王梦瑶, 胡静, 陈志泊. 多区域注意力的细粒度图像分类网络[J]. 计算机工程, 2024, 50(1): 271-278. Shangwang BAI, Mengyao WANG, Jing HU, Zhibo CHEN. Multi-Region Attention Network for Fine-Grained Image Classification[J]. Computer Engineering, 2024, 50(1): 271-278 [13] Liu H, Chen C L P, Gong X, et al. Robust saliency--aware distillation for few-shot fine-grained visual reco-gnition[J]. IEEE Transactions on Multimedia, 2024. [14] Alfassy A, Karlinsky L, Aides A, et al. Laso: Lab-el-set operations networks for multi-label few-shot -learning[C]//Proceedings of the IEEE/CVF conferen-ce on computer vision and pattern recognition. 201-9: 6548-6557. [15] Chu W H, Li Y J, Chang J C, et al. Spot and le-arn: A maximum-entropy patch sampler for few-sh-ot image classification[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern re-cognition. 2019: 6251-6260. [16] Schwartz E, Karlinsky L, Feris R, et al. Baby ste-ps towards few-shot learning with multiple semantic-cs[J]. Pattern Recognition Letters, 2022, 160: 142-147. [17] Snell J, Swersky K, Zemel R. Prototypical networ-ks for few-shot learning[J]. Advances in neural inf-ormation processing systems, 2017, 30. [18] Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[J]. Advances in ne-ural information processing systems, 2016, 29. [19] Sung F, Yang Y, Zhang L, et al. Learning to com-pare: Relation network for few-shot learning[C]//Pr-oceedings of the IEEE conference on computer vis-ion and pattern recognition. 2018: 1199-1208. [20] Li W, Wang L, Xu J, et al. Revisiting local descr-iptor based image-to-class measure for few-shot lea-rning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 7260-7268. [21] Lifchitz Y, Avrithis Y, Picard S, et al. Dense class-ification and implanting for few-shot learning[C]//P-roceedings of the IEEE/CVF conference on comput-er vision and pattern recognition. 2019: 9258-9267. [22] Finn C, Abbeel P, Levine S. Model-agnostic meta--learning for fast adaptation of deep networks[C]//I-nternational conference on machine learning. PMLR, 2017: 1126-1135. [23] Antoniou A, Edwards H, Storkey A. How to train-your MAML[C]//International conference on learnin-g representations. 2018. [24] Nguyen Q H, Nguyen C Q, Le D D, et al. Enhan-cing few-shot image classification with cosine tran-sformer[J]. IEEE Access, 2023, 11: 79659-79672. [25] 许华杰,梁书伟.采用特征图增强原型的小样本图像分类方法[J].计算机科学与探索,2024,18(04):990-1000. XU H J,LIANG S W. Few-Shot Image Classification Method with Feature Maps Enhancement Prototype[J]. Journal of Frontiers of Computer Science and Technol-ogy, 2024,18(04):990-1000. [26] Li X, Wu J, Sun Z, et al. BSNet: Bi-similarity ne-twork for few-shot fine-grained image classification[J]. IEEE Transactions on Image Processing, 2020, 30: 1318-1331. [27] Huang H, Zhang J, Zhang J, et al. Low-rank pair-wise alignment bilinear network for few-shot fine -grained image classification[J]. IEEE Transactions - on Multimedia, 2020, 23: 1666-1680. [28] Tian S, Tang H, Dai L. Coupled patch similarity -network for one-shot fine-grained image recognitio-n[C]//2021 IEEE international Conference on Image Processing (ICIP). IEEE, 2021: 2478-2482. [29] Zhao P, Li Y, Tang B, et al. Feature relocation ne-twork for fine-grained image classification[J]. Neur-al Networks, 2023, 161: 306-317. [30] Ma Z X, Chen Z D, Zhao L J, et al. Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification[C]//Pr-oceedings of the AAAI Conference on Artificial In-telligence. 2024, 38(5): 4136-4144. [31] Li W, Xu J, Huo J, et al. Distribution consistency based covariance metric networks for few-shot lear-ning[C]//Proceedings of the AAAI conference on ar-tificial intelligence. 2019, 33(01): 8642-8649. [32] Wang C, Song S, Yang Q, et al. Fine-grained few shot learning with foreground object transformation[J]. Neurocomputing, 2021, 466: 16-26. [33] Qi Y, Sun H, Liu N, et al. A task-aware dual sim-ilarity network for fine-grained few-shot learning[C]//Pacific Rim International Conference on Artificial Intelligence. Cham: Springer Nature Switzerland, 2022: 606-618. [34] Song Q, Zhou S, Xu L. Learning More Discrimina-tive Local Descriptors for Few-shot Learning[J]. a-rXiv preprint arXiv:2305.08721, 2023. [35] Yang Y, Feng Y, Zhu L, et al. Feature fusion net-work based on few-shot fine-grained classification[J]. Frontiers in Neurorobotics, 2023, 17: 1301192. [36] Sun Z, Zheng W, Guo P. KLSANet: Key local se-mantic alignment Network for few-shot image class-ification[J]. Neural Networks, 2024: 106456. [37] Wertheimer D, Tang L, Hariharan B. Few-shot clas-sification with feature map reconstruction networks[C]//Proceedings of the IEEE/CVF conference on c-omputer vision and pattern recognition. 2021: 8012-8021. [38] Li X, Song Q, Wu J, et al. Locally-enriched cross-reconstruction for few-shot fine-grained image class-ification[J]. IEEE Transactions on Circuits and Sys-tems for Video Technology, 2023, 33(12): 7530-7540. [39] Liu Y, Shao Z, Hoffmann N. Global attention mec-hanism: Retain information to enhance channel-spat-ial interactions[J]. arXiv preprint arXiv:2112.05561, 2021.

选择文件类型/文献管理软件名称

选择包含的内容