作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (11): 220-230. doi: 10.19678/j.issn.1000-3428.0066122

• 图形图像处理 • 上一篇    下一篇

面向分类网络的视觉语义解释模型

吕学强1, 赵兴强2, 贾智彬1, 韩晶1,*   

  1. 1. 北京信息科技大学 网络文化与数字传播北京市重点实验室, 北京 100101
    2. 北京信息科技大学 机电系统测控北京市重点实验室, 北京 100192
  • 收稿日期:2022-10-31 出版日期:2023-11-15 发布日期:2023-02-20
  • 通讯作者: 韩晶
  • 作者简介:

    吕学强(1970—),男,教授、博士,主研方向为多媒体信息处理、计算机视觉

    赵兴强,硕士研究生

    贾智彬,硕士研究生

  • 基金资助:
    国家自然科学基金(62171043); 北京市自然科学基金(4212020)

Visual Semantic Interpretation Model for Classification Network

Xueqiang LÜ1, Xingqiang ZHAO2, Zhibin JIA1, Jing HAN1,*   

  1. 1. Beijing Key Laboratory of Internet Culture Digital Dissemination, Beijing Information Science and Technology University, Beijing 100101, China
    2. Beijing Key Laboratory of Measurement and Control of Mechanical and Electrical System Technology, Beijing Information Science and Technology University, Beijing 100192, China
  • Received:2022-10-31 Online:2023-11-15 Published:2023-02-20
  • Contact: Jing HAN

摘要:

深度学习的可解释性对推动其在军事场景中应用至关重要。当前主流方法使用类激活图的方式可视化最后一层卷积特征,然而对于网络根据该特征进行分类解释比较模糊。针对此问题,设计一种面向分类网络的视觉语义解释模型。综合考虑前向传播与反向传播,提出CGNIS算法获取对分类结果起重要作用的神经元,并将其映射到原图,得到更加细化的视觉特征。提出分类网络IRENet,在VGG16中间层添加SIRM和ECA对视觉特征进行识别,更加客观地提取视觉特征中包含的语义特征,并结合视觉特征、语义特征、重要神经元分数信息生成描述模型分类过程的解释性语句。在ImageNet2012数据集上提取10类图像进行实验,结果表明,删除CGNIS算法得到的某一类重要神经元后,对应类的分类准确率下降3%以上,在语义特征提取任务上,IRENet的F1值、准确率、精确率和召回率4项指标较ResNet101等分类网络提升2%以上。此外,利用CGNIS、IRENet对飞机类别进行实验,可生成模型对其分类过程的解释性语句。

关键词: 分类网络, 可解释性, 类激活图, 重要神经元, 语义信息

Abstract:

The interpretability of deep learning is critical for promoting its application in military scenarios. Currently used mainstream methods involve class activation maps to visualize the last layer of convolutional features, but it is not clear how the network classifies based on this feature. To solve this problem, a visual semantic interpretation model is proposed for classification network. First, considering forward propagation and backward propagation, the CGNIS algorithm is proposed to obtain neurons that are important for classification and map them to the original image to obtain more refined visual features. Second, the classification network Indian Rural Energy Network(IRENet) is proposed. SIRM and ECA are added to the middle layer of VGG16 to recognize visual features. The semantic information contained in these visual features can be extracted more objectively, and the obtained visual features, semantic features, and important neuron score information can be combined to generate the explanatory statements describing the classification process of the model. Finally, 10 classes are extracted from ImageNet2012 dataset for experiments. Experiments show that the classification accuracy of the corresponding class decreases by more than 3% after removing the important neurons of a certain class obtained by applying the CGNIS algorithm. In terms of semantic feature extraction task, the F1 value, accuracy, precision and recall of IRENet are improved by more than 2% compared with ResNet101 and other classification networks. In addition, the experiments are conducted on the aircraft class using CGNIS and IRENet to generate model statements explaining its classification process.

Key words: classification network, interpretability, class activation map, important neuron, semantic information