作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 64-71. doi: 10.19678/j.issn.1000-3428.0065806

• 人工智能与模式识别 • 上一篇    下一篇

基于深度互学习的多标记零样本分类

袁志祥, 王雅卿, 黄俊   

  1. 安徽工业大学 计算机科学与技术学院, 安徽 马鞍山 243032
  • 收稿日期:2022-09-20 出版日期:2023-10-15 发布日期:2023-10-10
  • 作者简介:

    袁志祥(1973—),男,副教授,CCF会员,主研方向为机器学习、Petri网理论

    王雅卿,硕士研究生

    黄俊,副教授、博士

  • 基金资助:
    国家自然科学基金(61806005); 安徽省高校科学研究重点项目(KJ2021A0372); 安徽省高校科学研究重点项目(KJ2021A0373); 安徽省高校优秀青年人才支持计划项目(gxyqZD2022032)

Multi-Label Zero-Shot Classification Based on Deep Mutual Learning

Zhixiang YUAN, Yaqing WANG, Jun HUANG   

  1. School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, Anhui, China
  • Received:2022-09-20 Online:2023-10-15 Published:2023-10-10

摘要:

目前已有大量方案解决零样本图像分类问题,但对多标记零样本图像分类问题的研究很少,在现有的解决方案中,模型在训练时除了利用已标注的数据集和给定的先验知识外,只利用图像区域信息或只利用标签语义信息。基于深度互学习技术,提出一种能同时利用图像区域和标签语义两种信息的解决方法。设计两个子网络,将子网络1用于增强图像视觉特征,通过多头自注意机制关联图像中不同区域的特征信息,得到基于区域的视觉特征表示,再将该特征表示映射到语义空间中,并输出预测概率分布;使子网络2用于融合标签语义信息与图像视觉特征,通过计算标签和图像区域特征的相关性,得到基于语义的视觉特征表示,将特征表示映射到语义空间中输出概率分布。最后引入深度互学习技术,利用两个子网络得到的概率分布为对方提供训练经验以进行互相学习,该过程中子网络在训练自身分类性能的同时也学习对方的训练经验,有效提升多标记零样本图像分类的性能。实验结果表明,所提方法在MS COCO数据集上的F1值相比Deep0Tag方法提升了5.2个百分点。

关键词: 深度学习, 图像分类, 多标记学习, 零样本学习, 互学习

Abstract:

Numerous methods have been proposed to solve the zero-shot image classification problem; however, there are limited studies on the multi-label zero-shot image classification problem. In the existing solutions, in addition to the use of the basic settings of the labeled dataset and the given prior knowledge, the model either only uses the image region information or only the label semantic information. Based on deep mutual learning technology, this study proposes a solution that utilizes both the image region and label semantic information. Two sub-networks are designed. Sub-network 1 is used to enhance the visual features of the image, whereas the multi-head self-attention mechanism is used to associate the feature information of different regions in the image to obtain a region-based visual feature representation and then map the feature representation to the semantic space to output the predicted probability. Sub-network 2 is used to fuse the label semantic information and image visual features by calculating the correlation between the labels and image region features to obtain a semantic-based visual feature representation, and then map the feature representation to the semantic space to output a probability distribution. Finally, the deep mutual learning technology is introduced, and the probability distribution obtained by the two sub-networks is used to provide training experience for mutual learning. In this process, the sub-network refers to the training experience of the other sub-network while training its own classification performance, which effectively improves the performance of multi-label zero-shot image classification. The experimental results show that the F1 value of the proposed method on the MS COCO dataset increased by 5.2 percentage points compared to the Deep0Tag method.

Key words: deep learning, image classification, multi-label learning, zero-shot learning, mutual learning