作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (5): 239-246,254. doi: 10.19678/j.issn.1000-3428.0064396

• 图形图像处理 • 上一篇    下一篇

基于自适应三线性池化网络的细粒度图像分类

石进, 徐杨, 曹斌   

  1. 贵州大学 大数据与信息工程学院, 贵阳 550025
  • 收稿日期:2022-04-07 修回日期:2022-06-27 发布日期:2022-09-20
  • 作者简介:石进(1995-),男,硕士研究生,主研方向为计算机视觉、度量学习;徐杨(通信作者),副教授、博士;曹斌,教授、博士。
  • 基金资助:
    贵州省科技计划项目(黔合科支撑[2021] 一般176)。

Fine-Grained Image Categorization Based on Adaptive Trilinear Pooling Network

SHI Jin, XU Yang, CAO Bin   

  1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
  • Received:2022-04-07 Revised:2022-06-27 Published:2022-09-20

摘要: 细粒度图像分类的关键在于提取图像中微妙的特征。现有基于弱监督方式的细粒度图像识别方法大多使用专家标注的边界注释辅助定位关键区域,存在标注成本高、训练过程复杂等问题。基于弱监督的双线性卷积神经网络方法因其学习到的特征空间更符合细粒度图像特性而具有一定的有效性,但忽略了层间的相互作用。针对细粒度图像识别领域存在的关键区域识别困难和层间交互关联弱的问题,融合二阶协方差通道注意力机制、自适应特征掩码与自适应三线性池化,提出自适应三线性池化网络ATP-Net,用于细粒度图像分类任务。通过二阶协方差通道注意力机制学习通道上的注意力向量,构建自适应特征掩码模块学习空间维上的注意力矩阵,设计自适应三线性池化模块学习特征的最终表示,以充分利用空间维、通道维上的信息。在CUB-200、Cars-196和Aircraft-100 3个细粒度图像分类数据集上的实验结果表明,ATP-Net的分类精度分别为89.30%、94.20%和91.80%。

关键词: 细粒度图像分类, 注意力机制, 特征掩码, 自适应三线性池化, 高阶交互

Abstract: The key to Fine-Grained Image Categorization (FGIC) is to extract the subtle features in the image.Most of the existing fine-grained image recognition methods based on the weak supervision method use boundary annotation from expert annotation to assist in locating key areas,which has the problems of high labeling costs and a complex training process.The Bilinear-Convolutional Neural Network (B-CNN) method based on weak supervision is effective because its learned feature space is more consistent with the characteristics of fine-grained images,but it ignores the interaction between layers.Given the difficulties in identifying key areas and weak inter-layer interaction in the field of fine-grained image recognition,an adaptive trilinear pooling network,ATP-Net,is proposed by integrating the second-order covariance channel attention mechanism,an Adaptive Feature Mask(AFM),and Adaptive Trilinear Pooling(ATP) for FGIC tasks.The attention vector on the channel is learned through the second-order covariance channel attention mechanism,the attention matrix on the spatial dimension of the AFM module is constructed,and the final representation of the ATP module learning feature is designed to make full use of the information on the spatial dimension and the channel dimension.Experimental results on the CUB-200,Cars-196,and Aircraft-100 FGIC datasets show that the classification accuracy of ATP-Net is 89.30%,94.20%,and 91.80%,respectively.

Key words: Fine-Grained Image Categorization(FGIC), attention mechanism, feature mask, Adaptive Trilinear Pooling(ATP), Higher-Order Interaction(HOI)

中图分类号: