作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 191-199. doi: 10.19678/j.issn.1000-3428.0067369

• 图形图像处理 • 上一篇    下一篇

基于深度监督隐空间构建的语义分割改进方法

王柏涵*(), 姜晓燕, 范柳伊   

  1. 上海工程技术大学电子电气工程学院, 上海 201600
  • 收稿日期:2023-04-06 出版日期:2024-03-15 发布日期:2023-07-11
  • 通讯作者: 王柏涵
  • 基金资助:
    国家自然科学基金联合项目(U2033218); 国家自然科学基金重点项目(61831018)

Semantic Segmentation Improvement Method Based on Deep Supervision for the Construction of Latent Space

Bohan WANG*(), Xiaoyan JIANG, Liuyi FAN   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China
  • Received:2023-04-06 Online:2024-03-15 Published:2023-07-11
  • Contact: Bohan WANG

摘要:

现有卷积操作在语义分割任务中难以有效捕捉长距离区域间的关系,导致分割结果不符合人类常识。为此,提出一种基于深度监督隐空间构建的语义分割改进方法。采用“特征图-隐空间-特征图”流程,将图像空间的像素特征转换为隐空间中的节点特征,将区域之间的位置和语义关系转换为节点之间的连接权重,实现了从特征图到隐空间的特征转换。在隐空间构建过程中,使用Kullback-Leibler散度损失函数监督投影矩阵,以避免从特征图到隐空间节点的转换过程中丢失特征;使用InfoNCE损失函数监督节点特征表征与真实标签表征,使得图像特征与标签保持一致。该方法在构建的隐空间上使用图神经网络进行语义推理,学习节点之间的关系,赋予模型学习区域间语义关系的能力,从而改善分割结果中的反常识现象。在公开数据集CityScapes上的实验结果表明,相比基线分割网络,该方法的平均交并比(mIoU)为81.1%,相较于基线分割网络mIoU提升2.6个百分点,能有效提升分割结果。

关键词: 语义分割, 卷积神经网络, 深度监督, 图神经网络, 反常识现象

Abstract:

The existing convolution operations cannot effectively capture the relationships between long-distance regions in semantic segmentation tasks, resulting in segmentation results that do not conform to human common sense. Accordingly, a semantic segmentation improvement method based on deep supervised latent space construction is proposed. This article adopts the"feature map-hidden space-feature map"process to convert pixel features in an image space into node features in a hidden space, and convert the position and semantic relationships between regions into connection weights between nodes, thereby achieving feature conversion from the feature map to the hidden space. In the process of constructing the hidden space, the Kullback-Leibler divergence loss function is used to supervise the projection matrix, to avoid losing features during the transformation process from feature maps to hidden space nodes. It uses Information Noise Contrastive Estimation(InfoNCE) loss function to supervise node feature and real label representations, ensuring consistency between image features and labels. The proposed method uses Graph Neural Network(GNN) for semantic inference on the constructed latent space, learning the relationships between nodes and endowing the model with the ability to learn semantic relationships between regions, thereby improving the anti-common sense phenomenon in segmentation results. The experimental results on the publicly available dataset CityScapes demonstrate that compared to the baseline segmentation network, the mean Intersection over Union(mIoU) of the proposed method is 81.1%, which is 2.6 percentage points higher than that of the baseline segmentation network and can effectively improve the segmentation results.

Key words: semantic segmentation, Convolutional Neural Network(CNN), deep supervision, Graph Neural Network(GNN), anti-common sense phenomenon