基于自适应三线性池化网络的细粒度图像分类

doi:10.19678/j.issn.1000-3428.0064396

摘要/Abstract

摘要： 细粒度图像分类的关键在于提取图像中微妙的特征。现有基于弱监督方式的细粒度图像识别方法大多使用专家标注的边界注释辅助定位关键区域，存在标注成本高、训练过程复杂等问题。基于弱监督的双线性卷积神经网络方法因其学习到的特征空间更符合细粒度图像特性而具有一定的有效性，但忽略了层间的相互作用。针对细粒度图像识别领域存在的关键区域识别困难和层间交互关联弱的问题，融合二阶协方差通道注意力机制、自适应特征掩码与自适应三线性池化，提出自适应三线性池化网络ATP-Net，用于细粒度图像分类任务。通过二阶协方差通道注意力机制学习通道上的注意力向量，构建自适应特征掩码模块学习空间维上的注意力矩阵，设计自适应三线性池化模块学习特征的最终表示，以充分利用空间维、通道维上的信息。在CUB-200、Cars-196和Aircraft-100 3个细粒度图像分类数据集上的实验结果表明，ATP-Net的分类精度分别为89.30%、94.20%和91.80%。

关键词: 细粒度图像分类, 注意力机制, 特征掩码, 自适应三线性池化, 高阶交互

Abstract: The key to Fine-Grained Image Categorization （FGIC） is to extract the subtle features in the image.Most of the existing fine-grained image recognition methods based on the weak supervision method use boundary annotation from expert annotation to assist in locating key areas，which has the problems of high labeling costs and a complex training process.The Bilinear-Convolutional Neural Network （B-CNN） method based on weak supervision is effective because its learned feature space is more consistent with the characteristics of fine-grained images，but it ignores the interaction between layers.Given the difficulties in identifying key areas and weak inter-layer interaction in the field of fine-grained image recognition，an adaptive trilinear pooling network，ATP-Net，is proposed by integrating the second-order covariance channel attention mechanism，an Adaptive Feature Mask（AFM），and Adaptive Trilinear Pooling（ATP） for FGIC tasks.The attention vector on the channel is learned through the second-order covariance channel attention mechanism，the attention matrix on the spatial dimension of the AFM module is constructed，and the final representation of the ATP module learning feature is designed to make full use of the information on the spatial dimension and the channel dimension.Experimental results on the CUB-200，Cars-196，and Aircraft-100 FGIC datasets show that the classification accuracy of ATP-Net is 89.30%，94.20%，and 91.80%，respectively.

Key words: Fine-Grained Image Categorization（FGIC）, attention mechanism, feature mask, Adaptive Trilinear Pooling（ATP）, Higher-Order Interaction（HOI）

中图分类号:

TP391

石进, 徐杨, 曹斌. 基于自适应三线性池化网络的细粒度图像分类[J]. 计算机工程, 2023, 49(5): 239-246,254.

SHI Jin, XU Yang, CAO Bin. Fine-Grained Image Categorization Based on Adaptive Trilinear Pooling Network[J]. Computer Engineering, 2023, 49(5): 239-246,254.

https://www.ecice06.com/CN/Y2023/V49/I5/239

图/表 9

20230515190654

20230515190658

20230515190701

20230515190705

20230515190708

20230515190713

20230515190717

20230515190720

20230515190723

参考文献

[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[2] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:779-788.
[3] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:3431-3440.
[4] YUAN Y S,CHENG H,SESTER M.Keypoints-based deep feature fusion for cooperative vehicle detection of autonomous driving[J].IEEE Robotics and Automation Letters,2022,7(2):3054-3061.
[5] NAUMOV M,MUDIGERE D,SHI H J M,et al.Deep learning recommendation model for personalization and recommendation systems[EB/OL].[2022-03-05].https://arxiv.org/abs/1906.00091v1.
[6] HATAMIZADEH A,TANG Y C,NATH V,et al.UNETR:Transformers for 3D medical image segmentation[C]//Proceedings of Winter Conference on Applications of Computer Vision.Washington D.C.,USA:IEEE Press,2022:1748-1758.
[7] DENG J,DONG W,SOCHER R,et al.ImageNet:a large-scale hierarchical image database[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2009:248-255.
[8] ZHOU B L,LAPEDRIZA A,KHOSLA A,et al.Places:a 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(6):1452-1464.
[9] SUN C,SHRIVASTAVA A,SINGH S,et al.Revisiting unreasonable effectiveness of data in deep learning era[C]//Proceedings of International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:843-852.
[10] PENG Y X,HE X T,ZHAO J J.Object-part attention model for fine-grained image classification[J].IEEE Transactions on Image Processing,2018,27(3):1487-1500.
[11] 桂江生,麻陈飞,包晓安,等.递归深度混合关注网络的细粒度图像分类方法[J].计算机工程,2019,45(5):205-209. GUI J S,MA C F,BAO X A,et al.Fine-grained image classification method for recurrent deep hybrid attention network[J].Computer Engineering,2019,45(5):205-209.(in Chinese)
[12] 谭润,叶武剑,刘怡俊.结合双语义数据增强与目标定位的细粒度图像分类[J].计算机工程,2022,48(2):237-242,249. TAN R,YE W J,LIU Y J.Fine-grained image classification combining dual semantic data augmentation and target location[J].Computer Engineering,2022,48(2):237-242,249.(in Chinese)
[13] LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2016:1449-1457.
[14] YU C J,ZHAO X Y,ZHENG Q,et al.Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:574-589.
[15] WANG J Z,LI N Y,LUO Z M,et al.High-order-interaction for weakly supervised fine-grained visual categorization[J].Neurocomputing,2021,464:27-36.
[16] KONG S,FOWLKES C.Low-rank bilinear pooling for fine-grained classification[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:365-374.
[17] LIN T Y,MAJI S.Improved bilinear pooling with CNNs[EB/OL].[2022-03-05].https://arxiv.org/pdf/1707.06772.pdf.
[18] MIN S B,XIE H T,TIAN Y L,et al.Adaptive bilinear pooling for fine-grained representation learning[C]//Proceedings of Multimedia Asia.New York,USA:ACM Press,2019:1-6.
[19] LI P H,XIE J T,WANG Q L,et al.Towards faster training of global covariance pooling networks by iterative matrix square root normalization[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:947-955.
[20] MIN S B,YAO H T,XIE H T,et al.Multi-objective matrix normalization for fine-grained visual recognition[J].IEEE Transactions on Image Processing,2020,29:4996-5009.
[21] HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:7132-7141.
[22] GAO Z L,XIE J T,WANG Q L,et al.Global second-order pooling convolutional networks[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:3019-3028.
[23] SUN M,YUAN Y C,ZHOU F,et al.Multi-attention multi-class constraint for fine-grained image recognition[C]//Proceedings of the European Conference on Computer Vision.Berlin,Germany:Springer,2018:834-850.
[24] ZHENG H L,FU J L,ZHA Z J,et al.Looking for the devil in the details:learning trilinear attention sampling network for fine-grained image recognition[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:5007-5016.
[25] YANG Z,LUO T G,WANG D,et al.Learning to navigate for fine-grained classification[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:438-454.
[26] TAN M,WANG G J,ZHOU J,et al.Fine-grained classification via hierarchical bilinear pooling with aggregated slack mask[J].IEEE Access,2019,7:117944-117953.
[27] WAH C,BRANSON S,WELINDER P,et al.The Caltech-UCSD Birds-200-2011 dataset[EB/OL].[2022-03-05].http://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf.
[28] KRAUSE J,STARK M,DENG J,et al.3D object representations for fine-grained categorization[C]//Proceedings of International Conference on Computer Vision Workshops.Washington D.C.,USA:IEEE Press,2013:554-561.
[29] MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visual classification of aircraft[EB/OL].[2022-03-05].https://arxiv.org/pdf/1306.5151.pdf.
[30] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[31] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2022-03-05].https://arxiv.org/pdf/1409.1556.pdf.

选择文件类型/文献管理软件名称

选择包含的内容