作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 271-278. doi: 10.19678/j.issn.1000-3428.0066426

• 图形图像处理 • 上一篇    下一篇

多区域注意力的细粒度图像分类网络

白尚旺1,*(), 王梦瑶1, 胡静1, 陈志泊2   

  1. 1. 太原科技大学计算机科学与技术学院, 山西 太原 030024
    2. 北京林业大学信息学院, 北京 100091
  • 收稿日期:2022-12-02 出版日期:2024-01-15 发布日期:2024-01-11
  • 通讯作者: 白尚旺
  • 基金资助:
    国家自然科学基金(32071775); 山西省自然科学基金(202203021211189); 博士科研启动基金(20202057)

Multi-Region Attention Network for Fine-Grained Image Classification

Shangwang BAI1,*(), Mengyao WANG1, Jing HU1, Zhibo CHEN2   

  1. 1. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, Shanxi, China
    2. School of Information Science and Technology, Beijing Forestry University, Beijing 100091, China
  • Received:2022-12-02 Online:2024-01-15 Published:2024-01-11
  • Contact: Shangwang BAI

摘要:

目前细粒度图像分类的难点在于如何精准定位图像中高度可辨的局部区域以及其他辅助判别特征。提出一种多区域注意力的细粒度图像分类网络来解决这个问题。首先使用Inception-V3对图像特征进行提取,通过重复使用注意力擦除的方法使模型关注次要特征;然后通过背景去除以及上采样的方法获取图像更精准的局部图像,对提取到的局部特征进行位置统计,并以矩形框的方式获取图像整体,减少细节信息丢失;最后对局部与整体图像进行更加细致的学习。此外,设计联合损失函数,通过动态平衡难易样本和缩小类内差距的方法改善模型的识别效果。实验结果表明,该方法在公开的细粒度图像数据集CUB-200-2011、Stanford-Cars和FGVC-Aircraft上的准确率分别达到89.2%、94.8%、94.0%,相较于对比方法性能更优。

关键词: 多区域注意力, 细粒度图像分类, 擦除策略, 联合损失, 深度学习, 卷积神经网络

Abstract:

The current challenge in fine-grained image classification lies in accurately identifying highly recognizable local areas and auxiliary distinguishing features within an image. To address this issue, a multi-region attention network for fine-grained image classification is proposed. The process begins with the use of Inception-V3 for feature extraction. The model is then directed to focus on secondary features via repeated application of attention erasure. Subsequently, more precise local images are generated by removing the background and employing up-sampling techniques. This is followed by the analysis of the position statistics of the extracted local features. The entire image is then represented as a rectangular box, minimizing the loss of detailed information. Further detailed learning is conducted on local and overall images. Additionally, a joint loss function is designed to enhance the model's recognition capabilities. This is realized by dynamically balancing between difficult and easy samples and reducing intra-class variance. Experimental results on the public fine-grained image datasets CUB-200-2011, Stanford-Cars, and FGVC-Aircraft demonstrate that this method can realize accuracies of 89.2%, 94.8%, and 94.0%, respectively. These figures surpass those achieved by other methods.

Key words: multi-region attention, fine-grained image classification, erasure strategy, joint loss, deep learning, Convolutional Neural Network(CNN)