作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 173-178,190. doi: 10.19678/j.issn.1000-3428.0061072

• 图形图像处理 • 上一篇    下一篇

基于多层次注意力与图模型的图像多标签分类算法

朱旭东1,2, 熊贇1,2   

  1. 1. 复旦大学 计算机科学与技术学院, 上海 200433;
    2. 上海市数据科学重点实验室, 上海 200433
  • 收稿日期:2021-03-10 修回日期:2021-04-30 发布日期:2021-05-07
  • 作者简介:朱旭东(1995—),男,硕士研究生,主研方向为计算机视觉、图神经网络;熊贇,教授、博士。
  • 基金资助:
    国家自然科学基金(U1636207);上海市科委项目(19511121204)。

Multi-label Image Classification Algorithm Based on Multi-scale Attention and Graph Model

ZHU Xudong1,2, XIONG Yun1,2   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China;
    2. Shanghai Key Laboratory of Data Science, Shanghai 200433, China
  • Received:2021-03-10 Revised:2021-04-30 Published:2021-05-07

摘要: 图像多标签分类作为计算机视觉领域的重要研究方向,在图像识别、检测等场景下得到广泛应用。现有图像多标签分类方法无法有效利用标签相关性信息以及标签语义与图像特征的对应关系,导致分类能力较差。提出一种图像多标签分类的新算法,通过利用标签共现信息和标签先验知识构建图模型,使用多尺度注意力学习图像特征中目标,并利用标签引导注意力融合标签语义特征和图像特征信息,从而将标签相关性和标签语义信息融入到模型学习中。在此基础上,基于图注意力机制构建动态图模型,并对标签信息图模型进行动态更新学习,以充分融合图像信息和标签信息。在图像多标签分类任务上的实验结果表明,相比于现有最优算法MLGCN,该算法在VOC-2007数据集及COCO-2012数据集上的mAP值分别提高了0.6、1.2个百分点,性能有明显提升。

关键词: 多标签, 标签语义, 图像特征, 注意力机制, 动态图, 多尺度

Abstract: As an important research direction in the field of computer vision, multi-label image classification is widely used in recognition, detection, and other applications.Existing multi-label image classification methods cannot effectively use label correlation information and the corresponding relationship between label semantics and image features, resulting in poor classification ability.A new algorithm for multi-label image classification is proposed.By using tag co-occurrence information and tag prior knowledge to build a graph model, multi-scale attention is used to learn the target in image features, and tag guided attention is used to fuse tag semantic features and image feature information to integrate tag correlation and tag semantic information into model learning.On this basis, a dynamic graph model is constructed based on the graph attention mechanism, and the label information graph model is dynamically updated and learned to integrate the image and label information fully.The experimental results on a multi-label image classification task show that, compared with the existing optimal algorithm, Multi-Label Graph Convolutional Network(MLGCN), the mean Average Precision (mAP) values of the algorithm on the Visual Object Classes-2007(VOC-2007) and Common Object in COntext-2012 (COCO-2012) datasets are improved by 0.6 and 1.2 percentage points, respectively, improving the performance significantly.

Key words: multi label, label semantic, image feature, attention mechanism, dynamic graph, multi scale

中图分类号: