Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (4): 229-238. doi: 10.19678/j.issn.1000-3428.0069945

• Computer Vision and Image Processing • Previous Articles     Next Articles

Multi-Label Image Classification Based on Label Visual Prototype Learning

LI Jiao1, FAN Haodong1,*(), HONG Xudong1,2, XU Zhenyi2,3, FAN Xu1, HUANG Jun1,2   

  1. 1. School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, Anhui, China
    2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230094, Anhui, China
    3. Anhui Engineering Research Center for Intelligent Application and Security of Industrial Internet, Maanshan 243032, Anhui, China
  • Received:2024-06-03 Revised:2024-09-07 Online:2026-04-15 Published:2024-12-02
  • Contact: FAN Haodong

基于标签视觉原型学习的多标签图像分类

李娇1, 范浩东1,*(), 洪旭东1,2, 许镇义2,3, 樊旭1, 黄俊1,2   

  1. 1. 安徽工业大学计算机科学与技术学院, 安徽 马鞍山 243032
    2. 合肥综合性国家科学中心人工智能研究院, 安徽 合肥 230094
    3. 安徽省工业互联网智能应用与安全工程研究中心, 安徽 马鞍山 243032
  • 通讯作者: 范浩东
  • 作者简介:

    李娇, 女, 硕士研究生, 主研方向为深度学习、计算机视觉

    范浩东(通信作者), 助教、硕士

    洪旭东, 讲师、博士

    许镇义, 副研究员

    樊旭, 讲师、博士

    黄俊(CCF会员), 副教授、博士

  • 基金资助:
    国家自然科学基金(61806005); 安徽省工业互联网智能应用与安全工程研究中心开放基金(IASII22-03); 安徽省高校优秀青年人才支持计划项目(gxyqZD2022032); 安徽高校协同创新项目(GXXT-2022-052)

Abstract:

Multi-label image classification studies tend to use label semantic information and label co-occurrence probability as prior knowledge to guide the learning of multi-label classification models. However, most of these methods rely on additional semantic information, which makes it difficult to handle the information mismatch problem between different modalities. The calculation of label co-occurrence probability is also susceptible to data imbalance and noise. To address these issues, this study proposes a multi-label image classification method based on label visual prototype learning, which utilizes only the visual information of an image and constructs a multi-label classifier by generating label visual prototypes. This method reduces the reliance on prior knowledge and fully utilizes the visual information, effectively improving classification performance. First, an attention module based on class-specific activation maps is designed to guide the model to focus on image regions that are more relevant to the class and generate class-specific feature representations. Second, by capturing the visual prototype representation of each label, a label visual prototype dictionary is constructed to fully leverage the adaptability of visual feature information to image classification tasks. Finally, using this dictionary as a multi-label classifier, the visual features of the input image are reconstructed to obtain the predicted probability of the labels. Experimental results show that this method improves classification accuracy compared with similar methods on three standard multi-label image classification datasets.

Key words: deep learning, multi-label image classification, label visual prototype, dictionary learning, attention mechanism

摘要:

目前众多的多标签图像分类研究将标签语义信息和标签共现概率作为先验知识引导学习多标签分类模型, 但这类方法大多依赖额外的语义信息, 难以处理不同模态间的信息不匹配问题, 且标签共现概率的计算也容易受到数据不平衡和噪声的影响。提出一种基于标签视觉原型学习的多标签图像分类方法, 仅利用图像本身的视觉信息, 通过生成标签视觉原型的方式构建多标签分类器。该方法不仅减轻了对先验知识的依赖, 还充分利用了图像自身的视觉信息, 有效提升了分类性能。首先, 设计基于类特定激活图的注意力模块, 引导模型关注图像中与类别更加相关的区域, 并生成类特定特征表示; 然后, 通过捕获每个标签的视觉原型表示, 构建标签视觉原型字典, 充分发挥视觉特征信息与图像分类任务的适配性; 最后, 以该字典作为多标签分类器, 重构输入图像的视觉特征, 进而获取标签的预测概率。实验结果表明, 该方法在3个标准多标签图像分类数据集上的分类准确率较同类方法得到了提升。

关键词: 深度学习, 多标签图像分类, 标签视觉原型, 字典学习, 注意力机制