基于多层次注意力与图模型的图像多标签分类算法

doi:10.19678/j.issn.1000-3428.0061072

摘要/Abstract

摘要： 图像多标签分类作为计算机视觉领域的重要研究方向，在图像识别、检测等场景下得到广泛应用。现有图像多标签分类方法无法有效利用标签相关性信息以及标签语义与图像特征的对应关系，导致分类能力较差。提出一种图像多标签分类的新算法，通过利用标签共现信息和标签先验知识构建图模型，使用多尺度注意力学习图像特征中目标，并利用标签引导注意力融合标签语义特征和图像特征信息，从而将标签相关性和标签语义信息融入到模型学习中。在此基础上，基于图注意力机制构建动态图模型，并对标签信息图模型进行动态更新学习，以充分融合图像信息和标签信息。在图像多标签分类任务上的实验结果表明，相比于现有最优算法MLGCN，该算法在VOC-2007数据集及COCO-2012数据集上的mAP值分别提高了0.6、1.2个百分点，性能有明显提升。

关键词: 多标签, 标签语义, 图像特征, 注意力机制, 动态图, 多尺度

Abstract: As an important research direction in the field of computer vision, multi-label image classification is widely used in recognition, detection, and other applications.Existing multi-label image classification methods cannot effectively use label correlation information and the corresponding relationship between label semantics and image features, resulting in poor classification ability.A new algorithm for multi-label image classification is proposed.By using tag co-occurrence information and tag prior knowledge to build a graph model, multi-scale attention is used to learn the target in image features, and tag guided attention is used to fuse tag semantic features and image feature information to integrate tag correlation and tag semantic information into model learning.On this basis, a dynamic graph model is constructed based on the graph attention mechanism, and the label information graph model is dynamically updated and learned to integrate the image and label information fully.The experimental results on a multi-label image classification task show that, compared with the existing optimal algorithm, Multi-Label Graph Convolutional Network(MLGCN), the mean Average Precision (mAP) values of the algorithm on the Visual Object Classes-2007(VOC-2007) and Common Object in COntext-2012 (COCO-2012) datasets are improved by 0.6 and 1.2 percentage points, respectively, improving the performance significantly.

Key words: multi label, label semantic, image feature, attention mechanism, dynamic graph, multi scale

中图分类号:

TP391.41

朱旭东, 熊贇. 基于多层次注意力与图模型的图像多标签分类算法[J]. 计算机工程, 2022, 48(4): 173-178,190.

ZHU Xudong, XIONG Yun. Multi-label Image Classification Algorithm Based on Multi-scale Attention and Graph Model[J]. Computer Engineering, 2022, 48(4): 173-178,190.

https://www.ecice06.com/CN/Y2022/V48/I4/173

图/表 7

20230131201522

20230131201525

20230131201528

20230131201531

20230131201535

20230131201538

20230131201541

参考文献

[1] ZHU F, LI H S, OUYANG W L, et al.Learning spatial regularization with image-level supervisions for multi-label image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5513-5522.
[2] WANG Z X, CHEN T S, LI G B, et al.Multi-label image recognition by recurrently discovering attentional regions[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:464-472.
[3] CHEN S F, CHEN Y C, et al.Order-free RNN with visual attention for multi-label classification[C]//Proceedings of AAAI Conference on Artificial Intelligence.Los Angeles, USA:AAAI Press, 2018:765-777.
[4] 生龙, 马建飞, 杨瑞欣, 等.基于特征交换的CNN图像分类算法研究[J].计算机工程, 2020, 46(9):268-273. SHENG L, MA J F, YANG R X, et al.Research on CNN image classification algorithm based on feature exchange[J].Computer Engineering, 2020, 46(9):268-273.(in Chinese)
[5] 王一宾, 郑伟杰, 陈玉胜, 等.基于PLSA学习概率分布语义信息的多标签分类算法[J].南京大学学报(自然科学版), 2021, 57(1):75-89. WANG Y B, ZHENG W J, CHENG Y S, et al.Multi label classification algorithm based on PLSA learning probability distribution semantic information[J].Journal of Nanjing University (Natural Science), 2021, 57(1):75-89.(in Chinese)
[6] YOU R C, GUO Z Y, CUI L, et al.Cross-modality attention with semantic graph embedding for multi-label classification[C]//Proceedings of AAAI Conference on Artificial Intelligence.Menlo Park, USA:AAAI Press, 2020:12709-12716.
[7] 于玉海, 林鸿飞, 孟佳娜, 等.跨模态多标签生物医学图像分类建模识别[J].中国图象图形学报, 2018, 57(1):917-927. YU Y H, LIN H F, MENG J N, et al.Classification modeling and recognition for cross modal and multi-label biomedical image[J].Journal of Image and Graphics, 2018, 23(6):917-927.(in Chinese)
[8] YA W, HE D L, LI F, et al.Multi-label classification with label graph superimposing[C]//Proceedings of AAAI Conference on Artificial Intelligence.Menlo Park, USA:AAAI Press, 2020:12265-12272.
[9] CHEN T S, XU M X, HUI X L, et al.Learning semantic-specific graph representation for multi-label image recognition[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:522-531.
[10] 顾广华, 曹宇尧, 李刚, 等.基于语义标签生成和偏序结构的图像层级分类[J].软件学报, 2020, 31(2):531-543. GU G H, CAO Y Y, LI G, et al.Image hierarchical classification based on semantic label generation and partial order structure[J].Journal of Software, 2020, 31(2):531-543.(in Chinese)
[11] JIANG W, YANG Y, MAO J H, et al.CNN-RNN:a unified framework for multi-label image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:2285-2294.
[12] CHEN Z M, XIU S, WEI X S, et al.Multi-label image recognition with graph convolutional networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:5177-5186.
[13] HOU D B, ZHAO Z J, et al.Multi-label learning with visual-semantic embedded knowledge graph for diagnosis of radiology imaging[J].IEEE Access, 2021, 11:15720-15730.
[14] LIU H, PUSH S.ConceptNet-a practical commonsense reasoning tool-kit[J].BT Technology Journal, 2004, 22(4):211-226.
[15] VELIEKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL].[2021-02-04].https://arxiv.org/abs/1710.10903.
[16] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[17] EVERINGHAM M, LUC V G, CHRISTOPHER K W, et al.The pascal visual object classes challenge[J].International Journal of Computer Vision, 2010, 88(2):303-338.
[18] LIN T Y, MICHAEL M, SERGE B, et al.Microsoft coco:common objects in context[J].European Conference on Computer Vision, 2014, 2(4):740-744
[19] HE S Y, XU C, GUO T Y, et al.Reinforced multi-label image classification by exploring curriculum[C]//Proceedings of AAAI Conference on Artificial Intelligence.Menlo Park, USA:AAAI Press, 2018:376-388.
[20] WANG Y T, XIE Y Z, LIU Y, et al.Fast graph convolution network based multi-label image recognition via cross-modal fusion[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management.New York, USA:ACM Press, 2020:1575-1584.

选择文件类型/文献管理软件名称

选择包含的内容