Multi-Label Image Classification Based on Label Visual Prototype Learning

doi:10.19678/j.issn.1000-3428.0069945

Computer Engineering ›› 2026, Vol. 52 ›› Issue (4): 229-238. doi: 10.19678/j.issn.1000-3428.0069945

• Computer Vision and Image Processing • Previous Articles Next Articles

Multi-Label Image Classification Based on Label Visual Prototype Learning

LI Jiao¹, FAN Haodong¹^,*(), HONG Xudong¹^,², XU Zhenyi²^,³, FAN Xu¹, HUANG Jun¹^,²

1. School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, Anhui, China
2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230094, Anhui, China
3. Anhui Engineering Research Center for Intelligent Application and Security of Industrial Internet, Maanshan 243032, Anhui, China

Received:2024-06-03 Revised:2024-09-07 Online:2026-04-15 Published:2024-12-02
Contact: FAN Haodong

基于标签视觉原型学习的多标签图像分类

李娇¹, 范浩东¹^,*(), 洪旭东¹^,², 许镇义²^,³, 樊旭¹, 黄俊¹^,²

1. 安徽工业大学计算机科学与技术学院, 安徽马鞍山 243032
2. 合肥综合性国家科学中心人工智能研究院, 安徽合肥 230094
3. 安徽省工业互联网智能应用与安全工程研究中心, 安徽马鞍山 243032

通讯作者: 范浩东
作者简介:
李娇, 女, 硕士研究生, 主研方向为深度学习、计算机视觉
范浩东(通信作者), 助教、硕士
洪旭东, 讲师、博士
许镇义, 副研究员
樊旭, 讲师、博士
黄俊(CCF会员), 副教授、博士
基金资助:
国家自然科学基金(61806005); 安徽省工业互联网智能应用与安全工程研究中心开放基金(IASII22-03); 安徽省高校优秀青年人才支持计划项目(gxyqZD2022032); 安徽高校协同创新项目(GXXT-2022-052)

Abstract

Abstract:

Multi-label image classification studies tend to use label semantic information and label co-occurrence probability as prior knowledge to guide the learning of multi-label classification models. However, most of these methods rely on additional semantic information, which makes it difficult to handle the information mismatch problem between different modalities. The calculation of label co-occurrence probability is also susceptible to data imbalance and noise. To address these issues, this study proposes a multi-label image classification method based on label visual prototype learning, which utilizes only the visual information of an image and constructs a multi-label classifier by generating label visual prototypes. This method reduces the reliance on prior knowledge and fully utilizes the visual information, effectively improving classification performance. First, an attention module based on class-specific activation maps is designed to guide the model to focus on image regions that are more relevant to the class and generate class-specific feature representations. Second, by capturing the visual prototype representation of each label, a label visual prototype dictionary is constructed to fully leverage the adaptability of visual feature information to image classification tasks. Finally, using this dictionary as a multi-label classifier, the visual features of the input image are reconstructed to obtain the predicted probability of the labels. Experimental results show that this method improves classification accuracy compared with similar methods on three standard multi-label image classification datasets.

Key words: deep learning, multi-label image classification, label visual prototype, dictionary learning, attention mechanism

摘要：

目前众多的多标签图像分类研究将标签语义信息和标签共现概率作为先验知识引导学习多标签分类模型, 但这类方法大多依赖额外的语义信息, 难以处理不同模态间的信息不匹配问题, 且标签共现概率的计算也容易受到数据不平衡和噪声的影响。提出一种基于标签视觉原型学习的多标签图像分类方法, 仅利用图像本身的视觉信息, 通过生成标签视觉原型的方式构建多标签分类器。该方法不仅减轻了对先验知识的依赖, 还充分利用了图像自身的视觉信息, 有效提升了分类性能。首先, 设计基于类特定激活图的注意力模块, 引导模型关注图像中与类别更加相关的区域, 并生成类特定特征表示; 然后, 通过捕获每个标签的视觉原型表示, 构建标签视觉原型字典, 充分发挥视觉特征信息与图像分类任务的适配性; 最后, 以该字典作为多标签分类器, 重构输入图像的视觉特征, 进而获取标签的预测概率。实验结果表明, 该方法在3个标准多标签图像分类数据集上的分类准确率较同类方法得到了提升。

关键词: 深度学习, 多标签图像分类, 标签视觉原型, 字典学习, 注意力机制

LI Jiao, FAN Haodong, HONG Xudong, XU Zhenyi, FAN Xu, HUANG Jun. Multi-Label Image Classification Based on Label Visual Prototype Learning[J]. Computer Engineering, 2026, 52(4): 229-238.

李娇, 范浩东, 洪旭东, 许镇义, 樊旭, 黄俊. 基于标签视觉原型学习的多标签图像分类[J]. 计算机工程, 2026, 52(4): 229-238.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069945

https://www.ecice06.com/EN/Y2026/V52/I4/229

Figures/Tables 11

Fig.1 LVPL framework structure

Fig.2 Classification results on Pascal VOC2007 dataset

Fig.3 Visualization example of class-specific activation graph on Pascal VOC2007 dataset

Fig.4 Comparison of different θ values on three datasets

Fig.5 Comparison of different β values on three datasets

Fig.6 Comparison of different δ values on three datasets

References 31

1	ZHOU F T , HUANG S , LIU B , et al. Multi-label image classification via category prototype compositional learning. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (7): 4513- 4525. doi: 10.1109/TCSVT.2021.3128054
2	CHEN T S, XU M X, HUI X L, et al. Learning semantic-specific graph representation for multi-label image recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea: IEEE Press, 2019: 522-531.
3	XU J H , TIAN H D , WANG Z Y , et al. Joint input and output space learning for multi-label image classification. IEEE Transactions on Multimedia, 2021, 23, 1696- 1707. doi: 10.1109/TMM.2020.3002185
4	CHEN Z M , WEI X S , WANG P , et al. Learning graph convolutional networks for multi-label recognition and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (6): 6969- 6983. doi: 10.1109/TPAMI.2021.3063496
5	DENG X , FENG S H , LYU G Y , et al. Beyond word embeddings: heterogeneous prior knowledge driven multi-label image classification. IEEE Transactions on Multimedia, 2023, 25, 4013- 4025. doi: 10.1109/TMM.2022.3171095
6	EVERINGHAM M , GOOL L , WILLIAMS C K I , et al. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, 88 (2): 303- 338. doi: 10.1007/s11263-009-0275-4
7	朱旭东, 熊贇. 基于多层次注意力与图模型的图像多标签分类算法. 计算机工程, 2022, 48 (4): 173-178, 190. doi: 10.19678/j.issn.1000-3428.0061072
	ZHU X D , XIONG Y . Multi label image classification algorithm based on multi-level attention and graph model. Computer Engineering, 2022, 48 (4): 173-178, 190. doi: 10.19678/j.issn.1000-3428.0061072
8	ZHENG J Y, ZHU W C, ZHU P F. Multi-label quadruplet dictionary learning[C]//Proceedings of the 29th Conference on Artificial Neural Networks and Machine Learning. Berlin, Germany: Springer, 2020: 119-131.
9	CHEN Z M, WEI X S, WANG P, et al. Multi-label image recognition with graph convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE Press, 2019: 5172-5181.
10	LI Q , PENG X J , QIAO Y , et al. Learning label correlations for multi-label image recognition with graph networks. Pattern Recognition Letters, 2020, 138, 378- 384. doi: 10.1016/j.patrec.2020.07.040
11	WANG Y T, XIE Y Z, LIU Y, et al. Fast graph convolution network based multi-label image recognition via cross-modal fusion[C]//Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York, USA: ACM Press, 2020: 1575-1584.
12	李鹏芳, 刘芳, 李玲玲, 等. 嵌入标签语义的元特征再学习和重加权小样本目标检测. 计算机学报, 2022, 45 (12): 2561- 2575.
	LI P F , LIU F , LI L L , et al. Meta feature relearning and reweighted small sample object detection with embedded tag semantics. Chinese Journal of Computer, 2022, 45 (12): 2561- 2575.
13	XING C , ROSTAMZADEH N , ORESHKIN B , et al. Adaptive cross-modal few-shot learning. Advances in Neural Information Processing Systems, 2019, 32 (1): 85- 96.
14	HUANG S , LIN J K , HUANGFU L W . Class-prototype discriminative network for generalized zero-shot learning. IEEE Signal Processing Letters, 2020, 27, 301- 305. doi: 10.1109/LSP.2020.2968213
15	胡升龙, 陈彬, 张开华, 等. 场景结构知识增强的协同显著性目标检测. 计算机工程, 2025, 51 (1): 31- 41. doi: 10.19678/j.issn.1000-3428.0070064
	HU S L , CHEN B , ZHANG K H , et al. Co-saliency object detection enhanced by scene structure knowledge. Computer Engineering, 2025, 51 (1): 31- 41. doi: 10.19678/j.issn.1000-3428.0070064
16	ZHU K, WU J X. Residual attention: a simple but effective method for multi-label recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE Press, 2021: 184-193.
17	YUAN J , CHEN S K , ZHANG Y , et al. Graph attention transformer network for multi-label image classification. ACM Transactions on Multimedia Computing, Communications, and Applications, 2023, 19 (4): 1- 16.
18	CHEN B , LI J , LU G , et al. Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification. IEEE Journal of Biomedical and Health Informatics, 2020, 24 (8): 2292- 2302. doi: 10.1109/JBHI.2020.2967084
19	CAO X , ZHANG H , GUO X , et al. SLED: semantic label embedding dictionary representation for multilabel image annotation. IEEE Transactions on Image Process, 2015, 24 (9): 2746- 2759. doi: 10.1109/TIP.2015.2428055
20	ZHAO D D, YI M H, GUO J X, et al. A novel image classification method based on multi-layer dictionary learning[C]//Proceedings of the CAA Symposium on Fault Detection, Supervision, and Safety for Technical Processes. Chengdu, China: IEEE Press, 2021: 1-6.
21	ZHOU F T, HUANG S, XING Y. Deep semantic dictionary learning for multi-label image classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI Press, 2021: 3572-3580.
22	YE J, HE J J, PENG X J, et al. Attention-driven dynamic graph convolutional network for multi-label image recognition [C]//Proceedings of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 649-665.
23	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60 (6): 84- 90. doi: 10.1145/3065386
24	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 740-755.
25	SMITH L N, TOPIN N. Super-convergence: very fast training of neural networks using large learning rates[C]//Proceedings of Conference on Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. Baltimore, USA: SPIE, 2019: 369-386.
26	QU X W , CHE H , HUANG J , et al. Multi-layered semantic representation network for multi-label image classification. International Journal of Machine Learning and Cybernetics, 2023, 14 (10): 3427- 3435. doi: 10.1007/s13042-023-01841-6
27	黄俊, 范浩东, 洪旭东, 等. 基于语义信息引导的多标签图像分类. 北京航空航天大学学报, 2025, 51 (7): 2271- 2281.
	HUANG J , FAN H D , HONG X D , et al. Multi label image classification guided by semantic information. Journal of Beihang University, 2025, 51 (7): 2271- 2281.
28	YANG H, ZHOU J T, ZHANG Y, et al. Exploit bounding box annotations for multi-label object recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 280-288.
29	WANG M , LUO C Z , HONG R C , et al. Beyond object proposals: random crop pooling for multi-label image recognition. IEEE Transactions on Image Process, 2016, 25 (12): 5678- 5688. doi: 10.1109/TIP.2016.2612829
30	GAO B B , ZHOU H Y . Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing, 2021, 30, 5920- 5932. doi: 10.1109/TIP.2021.3088605
31	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE Press, 2021: 9992-10002.

[1]	LI Haoxuan, ZHANG Zhiyuan, LIU Rui, XU Peihua, TIAN Xin. Climate Downscaling of Image Super-Resolution Based on Implicit Neural Representation [J]. Computer Engineering, 2026, 52(4): 376-385.
[2]	WANG Wen, YANG Kuiwu, TONG Songsong, WEI Jianghong, XUE Yan, ZHOU Rongkui. Research on Watermarking Attack of Deep Neural Network Models [J]. Computer Engineering, 2026, 52(4): 22-38.
[3]	CUI Shaoguo, XU Song, WANG Mingyang, ZHOU Yue. Research Progress on Deep Learning Knowledge Tracing for Intelligent Education [J]. Computer Engineering, 2026, 52(4): 39-61.
[4]	TANG Weibo, FANG Qiang, LI Peigen, AI Longjin, XIONG Jinhong, XIA Haiting. RSD-YOLO-Based Small Target Detection in UAV Aerial Images [J]. Computer Engineering, 2026, 52(4): 214-228.
[5]	CHENG Bin, ZHAO Binbing, LEI Hua, HE Bo. Localization Method of Rebar Tying Nodes Based on Binocular Vision [J]. Computer Engineering, 2026, 52(4): 433-445.
[6]	YIN Hengjie, ZHENG Keqing, KE Jiannan, DONG Yunquan. Local Momentum Accelerated Based Non-IID Federated Learning Method [J]. Computer Engineering, 2026, 52(4): 103-110.
[7]	ZHANG Zhi, YIN Yukai, SUN Yiling, MENG Wenjing, PENG Chang. Research on Android Malware Detection Model Based on Multi-modal Feature Fusion [J]. Computer Engineering, 2026, 52(3): 243-254.
[8]	SU Jianhua, CHI Yunxian, XU Yunfeng, GAO Kai. Multimodal Intent Recognition Based on Attention Modality Fusion [J]. Computer Engineering, 2026, 52(3): 234-242.
[9]	WU Xuesong, CHEN Yuanyuan, ZHOU Tao. Adaptive No-Reference Image Quality Assessment Based on Multi-Scale Pyramid Pooling [J]. Computer Engineering, 2026, 52(3): 107-118.
[10]	LIU Xiaoyu, LIAO Zhifang, TAN Sui, YU Zhiwu. Bridge Dynamic Strain Prediction Based on Stacked GRU Neural Network [J]. Computer Engineering, 2026, 52(3): 441-450.
[11]	CAO Jiwei, LUO Fei, DING Weichao. BS-YOLO: A Small Object Detection Algorithm Based on BSAM Attention Mechanism and SCConv [J]. Computer Engineering, 2026, 52(3): 119-127.
[12]	CHEN Guolian, FENG Ziyang, CAO Junkuo. Research on Cyberbullying Detection Based on Multimodal Spatial Feature Fusion [J]. Computer Engineering, 2026, 52(3): 255-263.
[13]	PAN Lihu, YIN Jiali, ZHANG Rui, XIE Binhong, ZHANG Linliang. Global-Local Spatiotemporal Perception Model for Traffic Flow Prediction [J]. Computer Engineering, 2026, 52(3): 392-402.
[14]	ZHANG Yonghong, SUN Shulin, GONG Meng, WANG Junfei, MA Guangyi. Remote Sensing Cloud Image Prediction Method Based on Multi-scale Motion Memory Model [J]. Computer Engineering, 2026, 52(3): 128-140.
[15]	LIU Chang, LIANG Bingxue, TIAN Rongkun, QIN Yuhua. Medical and Health Question Classification Based on Multi-feature Fusion and Hybrid Neural Network [J]. Computer Engineering, 2026, 52(2): 342-355.

Please choose a citation manager

Content to export