作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 237-242,249. doi: 10.19678/j.issn.1000-3428.0060111

• 图形图像处理 • 上一篇    下一篇

结合双语义数据增强与目标定位的细粒度图像分类

谭润, 叶武剑, 刘怡俊   

  1. 广东工业大学 信息工程学院, 广州 514000
  • 收稿日期:2020-11-26 修回日期:2021-01-24 发布日期:2021-02-03
  • 作者简介:谭润(1996-),男,硕士研究生,主研方向为细粒度图像分类;叶武剑(通信作者),讲师、博士;刘怡俊,教授、博士。
  • 基金资助:
    广东省重点区域研究开发计划项目(2018B030338001);广东工业大学青年百人项目(220413548)。

Fine-Grained Image Classification Combining Dual Semantic Data Augmentation and Target Location

TAN Run, YE Wujian, LIU Yijun   

  1. School of Information Engineering, Guangdong University of Technology, Guangzhou 514000, China
  • Received:2020-11-26 Revised:2021-01-24 Published:2021-02-03

摘要: 细粒度图像分类旨在对属于同一基础类别的图像进行更细致的子类划分,其较大的类内差异和较小的类间差异使得提取局部关键特征成为关键所在。提出一种结合双语义数据增强与目标定位的细粒度图像分类算法。为充分提取具有区分度的局部关键特征,在训练阶段基于双线性注意力池化和卷积块注意模块构建注意力学习模块和信息增益模块,分别获取目标局部细节信息和目标重要轮廓这2类不同语义层次的数据,以双语义数据增强的方式提高模型准确率。同时,在测试阶段构建目标定位模块,使模型聚焦于分类目标整体,从而进一步提高分类准确率。实验结果表明,该算法在CUB-200-2011、FGVC Aircraft和Stanford Cars数据集中分别达到89.5%、93.6%和94.7%的分类准确率,较基准网络Inception-V3、双线性注意力池化特征聚合方式以及B-CNN、RA-CNN、MA-CNN等算法具有更好的分类性能。

关键词: 细粒度图像分类, 数据增强, 双线性网络, 注意力学习, 目标定位

Abstract: Fine-grained image classification aims to classify images of the same basic category into more specific subcategories.These images are characterized by large intra-class differences and minor inter-class differences, so the extraction of local key features is crucial to fine-grained image classification.A fine-grained image classification algorithm combining dual semantic data augmentation and target location is proposed.To extract discriminative local key features, two modules are constructed in the training phase to obtain two types of data at different semantic levels.The attention learning module is constructed based on Bilinear Attention Pooling(BAP) to obtain local detail information of the target, and the information gain module is constructed based on Convolutional Block Attention Module(CBAM) to obtain the important contour of the target.Then the accuracy of the model can be improved in the way of dual semantic data augmentation.At the same time, a target location module is built in the testing phase to make the model focus on the overall classification target and further improve the classification accuracy.The experimental results show that the proposed model displays a classification accuracy of 89.5% on CUB-200-2011 dataset, 93.6% on FGVC Aircraft dataset and 94.7% on Stanford Cars dataset, delivering higher performance than benchmark network Inception-V3, Bilinear Attention Pooling(BAP) feature aggregation method, B-CNN, RA-CNN, MA-CNN and other algorithms.

Key words: fine-grained image classification, data augmentation, bilinear network, attention learning, target location

中图分类号: