作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (2): 158-166. doi: 10.19678/j.issn.1000-3428.0070136

• 计算机视觉与图形图像处理 • 上一篇    

基于邻域融合和特征增强的小样本细粒度图像分类

文浪, 苟光磊, 白瑞峰, 缪宛谕   

  1. 重庆理工大学计算机科学与工程学院, 重庆 400054
  • 收稿日期:2024-07-17 修回日期:2024-09-03 发布日期:2024-11-14
  • 作者简介:文浪(CCF学生会员),男,硕士研究生,主研方向为小样本细粒度图像学习;苟光磊(通信作者),讲师、博士,E-mail:ggl@cqut.edu.cn;白瑞峰、缪宛谕,硕士研究生。
  • 基金资助:
    国家自然科学基金(62141201);重庆市教委科学技术项目(KJZD-M202201102);重庆理工大学研究生2024创新项目(gzlcx20243182)。

Few-shot Fine-grained Image Classification Based on Neighborhood Fusion and Feature Enhancement

WEN Lang, GOU Guanglei, BAI Ruifeng, MIAO Wanyu   

  1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2024-07-17 Revised:2024-09-03 Published:2024-11-14

摘要: 目前,细粒度图像分类任务面临着标注困难、样本数量稀缺以及类别差异微小等挑战。为了应对这些问题,提出一种基于邻域融合和特征增强的小样本细粒度图像分类方法。首先,利用离散余弦变换(DCT)和通道注意力机制分别捕获图像的全局信息和局部信息,并将这2种特征在通道维度上进行拼接,这种结合空间域和频率域的特征提取方法不仅增强了样本特征的多样化,还提高了模型的泛化能力;其次,引入特征增强模块计算查询样本与支持类原型之间的相关性,生成自适应权重,以指导查询信息,补充支持样本图像的细致学习,这一过程有效地捕捉了同类别图像之间的差异,同时抑制不同类别图像的局部相似性;最后,使用双相似性度量模块衡量支持类原型与待分类样本图像之间的相关分数,实现更精准的图像分类。实验结果表明,在Mini-ImageNet、CUB-200-2011、Stanford Dogs和Stanford Cars 4个公开数据集的5-shot任务中,该方法的准确率分别达到了79.22%、87.47%、79.23%和83.71%,相较于对比方法性能更优。

关键词: 小样本学习, 细粒度图像分类, 频域学习, 特征增强, 注意力机制, 度量学习

Abstract: Currently fine-grained image classification faces challenges such as labeling difficulties, scarce sample numbers, and subtle category differences. To address these issues, a few-shot fine-grained image classification method based on neighborhood fusion and feature enhancement is proposed. First, the Discrete Cosine Transform (DCT) and channel attention mechanisms are used to capture global and local information from images, respectively. These features are then concatenated along the channel dimensions. This method of combining spatial- and frequency-domain feature extraction enhances the diversity of sample features and improves model generalization. Second, a feature enhancement module is introduced to compute the correlation between query samples and support class prototypes, generating adaptive weights to guide query information to complement the detailed learning of support sample images. This process effectively captures the differences between images of the same class and suppresses local similarities between different classes. Finally, a dual-similarity measurement module assesses the correlation scores between the support class prototypes and the images to be classified, improving the accuracy of classification performance. The experimental results show that this method achieves accuracies of 79.22%, 87.47%, 79.23%, and 83.71% on the 5-shot tasks in the Mini-ImageNet, CUB-200-2011, Stanford Dogs, and Stanford Cars datasets, respectively, outperforming comparative methods.

Key words: few-shot learning, fine-grained image classification, frequency domain learning, feature enhancement, attention mechanism, metric learning

中图分类号: