Semantic Segmentation Improvement Method Based on Deep Supervision for the Construction of Latent Space

doi:10.19678/j.issn.1000-3428.0067369

Abstract

Abstract:

The existing convolution operations cannot effectively capture the relationships between long-distance regions in semantic segmentation tasks, resulting in segmentation results that do not conform to human common sense. Accordingly, a semantic segmentation improvement method based on deep supervised latent space construction is proposed. This article adopts the"feature map-hidden space-feature map"process to convert pixel features in an image space into node features in a hidden space, and convert the position and semantic relationships between regions into connection weights between nodes, thereby achieving feature conversion from the feature map to the hidden space. In the process of constructing the hidden space, the Kullback-Leibler divergence loss function is used to supervise the projection matrix, to avoid losing features during the transformation process from feature maps to hidden space nodes. It uses Information Noise Contrastive Estimation(InfoNCE) loss function to supervise node feature and real label representations, ensuring consistency between image features and labels. The proposed method uses Graph Neural Network(GNN) for semantic inference on the constructed latent space, learning the relationships between nodes and endowing the model with the ability to learn semantic relationships between regions, thereby improving the anti-common sense phenomenon in segmentation results. The experimental results on the publicly available dataset CityScapes demonstrate that compared to the baseline segmentation network, the mean Intersection over Union(mIoU) of the proposed method is 81.1%, which is 2.6 percentage points higher than that of the baseline segmentation network and can effectively improve the segmentation results.

Key words: semantic segmentation, Convolutional Neural Network(CNN), deep supervision, Graph Neural Network(GNN), anti-common sense phenomenon

摘要：

现有卷积操作在语义分割任务中难以有效捕捉长距离区域间的关系，导致分割结果不符合人类常识。为此，提出一种基于深度监督隐空间构建的语义分割改进方法。采用“特征图-隐空间-特征图”流程，将图像空间的像素特征转换为隐空间中的节点特征，将区域之间的位置和语义关系转换为节点之间的连接权重，实现了从特征图到隐空间的特征转换。在隐空间构建过程中，使用Kullback-Leibler散度损失函数监督投影矩阵，以避免从特征图到隐空间节点的转换过程中丢失特征；使用InfoNCE损失函数监督节点特征表征与真实标签表征，使得图像特征与标签保持一致。该方法在构建的隐空间上使用图神经网络进行语义推理，学习节点之间的关系，赋予模型学习区域间语义关系的能力，从而改善分割结果中的反常识现象。在公开数据集CityScapes上的实验结果表明，相比基线分割网络，该方法的平均交并比（mIoU）为81.1%，相较于基线分割网络mIoU提升2.6个百分点，能有效提升分割结果。

关键词: 语义分割, 卷积神经网络, 深度监督, 图神经网络, 反常识现象

Bohan WANG, Xiaoyan JIANG, Liuyi FAN. Semantic Segmentation Improvement Method Based on Deep Supervision for the Construction of Latent Space[J]. Computer Engineering, 2024, 50(3): 191-199.

王柏涵, 姜晓燕, 范柳伊. 基于深度监督隐空间构建的语义分割改进方法[J]. 计算机工程, 2024, 50(3): 191-199.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0067369

http://www.ecice06.com/EN/Y2024/V50/I3/191

Figures/Tables 8

Fig.1 Overall procedure of semantic segmentation improvement network based on deep supervised latent space construction

Fig.2 Procedure of latent space construction based on feature transformation

Fig.3 Schematic diagram of graph transfer in graph convolutional network

Fig.4 Visualization results of the projection matrix at different training stages

Fig.5 Visualization results of semantic segmentation among different methods

References 29

1	NADEEM U, SHAH S A, SOHEL F, et al. Deep learning for scene understanding. Handbook of Deep Learning Applications, 2019, 5, 21- 51.
2	褚张晴晴, 钟志强, 颜子夜, 等. 基于特征融合与注意力机制的脑肿瘤分割算法. 计算机工程, 2023, 49(10): 154- 161. URL
	CHU Z Q Q, ZHONG Z Q, YAN Z Y, et al. Brain tumor segmentation algorithm based on feature fusion and attention mechanism. Computer Engineering, 2023, 49(10): 154- 161. URL
3	MIYAMOTO R, NAKAMURA Y, ADACHI M, et al. Vision-based road-following using results of semantic segmentation for autonomous navigation[C]//Proceedings of the 9th International Conference on Consumer Electronics. Washington D. C., USA: IEEE Press, 2019: 174-179.
4	SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 3431- 3440.
5	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2015: 234-241.
6	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional Nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834- 848.
7	DHINGRA N, CHOGOVADZE G, KUNZ A. Border-SegGCN: improving semantic segmentation by refining the border outline using graph convolutional network[C]//Proceedings of International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 865-875.
8	YUAN Y H, CHEN X L, WANG J D, et al. Object-contextual representations for semantic segmentation[EB/OL]. [2023-03-01]. https://arxiv.org/abs/1909.11065v2.
9	AKULA A R, WANG K Z, LIU C S, et al. CX-ToM: counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models. iScience, 2022, 25(1): 103581. doi: 10.1016/j.isci.2021.103581
10	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-03-01]. https://www.arxiv.org/pdf/1409.1556.pdf.
11	SRIVASTAVA R K, GREFF K, SCHMIDHUBER J. Training very deep networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2015: 28.
12	袁立宁, 胡皓, 刘钊. 基于多通道图卷积自编码器的图表示学习. 计算机工程, 2023, 49(2): 150-160, 174. URL
	YUAN L N, HU H, LIU Z. Graph representation learning based on multi-channel graph convolutional autoencoders. Computer Engineering, 2023, 49(2): 150-160, 174. URL
13	刘宽, 奚小冰, 周明东. 基于自适应多尺度图卷积网络的骨架动作识别. 计算机工程, 2023, 49(10): 264- 271. URL
	LIU K, XI X B, ZHOU M D. Skeleton action recognition based on adaptive multi-scale graph convolution network. Computer Engineering, 2023, 49(10): 264- 271. URL
14	李威庭. 基于改进图卷积神经网络的面部动作单元识别算法研究[D]. 北京: 北京工业大学, 2021.
	LI W T. Research on improved graph convolutional neural network computing model for facial action unit recognition[D]. Beijing: Beijing University of Technology, 2021. (in Chinese)
15	LI Y, GUPTA A. Beyond grids: learning graph representations for visual recognition[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2018: 9225-9235.
16	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2023-03-01]. http://arxiv.org/abs/arXiv:1609.02907.
17	CHEN Y, ROHRBACH M, YAN Z, et al. Graph-based global reasoning networks[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 433-442.
18	YOU Y N, CHEN T L, SUI Y D, et al. Graph contrastive learning with augmentations[EB/OL]. [2023-03-01]. http://www.arXiv:2010.13902.
19	YU Q Y, LOU J M, ZHAN X Y, et al. Adversarial contrastive learning via asymmetric InfoNCE[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 53-69.
20	LEE C Y, XIE S, GALLAGHER P, et al. Deeply-supervised Nets[EB/OL]. [2023-03-01]. http://de.arxiv.org/pdf/1409.5185.
21	ZHANG L, CHEN X, ZHANG J, et al. Contrastive deep supervision[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 1-19.
22	CORDTS M, OMRAN M, RAMOS S, et al. The CityScapes dataset[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 2.
23	XIE E Z, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[EB/OL]. [2023-03-01]. https://arxiv.org/abs/2105.15203.
24	YU C Q, WANG J B, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 325-341.
25	YU C Q, GAO C X, WANG J B, et al. BiSeNetV2: bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 2021, 129, 3051- 3068.
26	SEICHTER D, KÖHLER M, LEWANDOWSKI B, et al. Efficient RGB-D semantic segmentation for indoor scene analysis[C]//Proceedings of International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 2021: 13525-13531.
27	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2881-2890.
28	DONG X Y, BAO J M, CHEN D D, et al. CSWin Transformer: a general vision Transformer backbone with cross-shaped windows[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 12124-12134.
29	GU J Q, KWON H, WANG D L, et al. Multi-scale high-resolution vision transformer for semantic segmentation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 12094-12103.

[1]	JIANG Baihao, LIU Jing, QIU Dawei, JIANG Liang. Review of Deep Learning Applications in Spinal Image Segmentation [J]. Computer Engineering, 2024, 50(3): 1-15.
[2]	XIE Xinlin, YIN Dongxu, ZHANG Taoyuan, XIE Gang. Multiscale Fusion Crowd Counting Algorithm Based on Attention Mechanism [J]. Computer Engineering, 2024, 50(3): 290-297.
[3]	Jun WANG, Huixia LAI, Yue WAN, Shi ZHANG. Angle-based Graph Neural Network Method for Anomaly Detection in High Dimensional Data [J]. Computer Engineering, 2024, 50(3): 156-165.
[4]	Haochen XU, Manhua LIU. Facial Landmark Detection Based on Hierarchical Self-Attention Network [J]. Computer Engineering, 2024, 50(2): 239-246.
[5]	Qian LI, Weibo WEI, Guangyu YANG, Jintao SONG, Lu SUN, Zhenkuan PAN. Poisson Denoising Variational Model Based on Prior-Driven Deep Neural Network [J]. Computer Engineering, 2024, 50(2): 273-280.
[6]	Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU. Graph Neural Network Recommendation Algorithm Based on Multimodal Fusion [J]. Computer Engineering, 2024, 50(1): 91-100.
[7]	Jiajing GU, Dan YANG, Tiezheng NIE, Yue KOU. Recommendation Algorithm Based on Multi-view Fusion Cross-layer Contrastive Learning [J]. Computer Engineering, 2024, 50(1): 120-128.
[8]	Shangwang BAI, Mengyao WANG, Jing HU, Zhibo CHEN. Multi-Region Attention Network for Fine-Grained Image Classification [J]. Computer Engineering, 2024, 50(1): 271-278.
[9]	Wei XU, Xiaowei FU, Xi LI, Yaokun WANG. Electrode Microscopic Image Segmentation Method by Fusing Multi-layer Perceptual Attention [J]. Computer Engineering, 2024, 50(1): 329-338.
[10]	Xianguo LI, Bin LI. Image Deblurring Based on Transformer and Multi-scale CNN [J]. Computer Engineering, 2023, 49(9): 226-233, 245.
[11]	Xiaoli LIU, Yitong WANG. Multi-density Graph-based Session Recommendation Using Self-supervised Learning [J]. Computer Engineering, 2023, 49(9): 60-68, 78.
[12]	Lu HAN, Weigang HUO, Yonghui ZHANG, Tao LIU. Multivariate Time Series Forecasting Based on Multi-Scale Feature Fusion and Dual-Attention Mechanism [J]. Computer Engineering, 2023, 49(9): 99-108.
[13]	Xiaodong SU, Shizhou LI, Jiayuan ZHAO, Hongyu LIANG, Yurong ZHANG, Hongyan XU. Image Semantic Segmentation Based on Multi-level Superposition and Attention Mechanism [J]. Computer Engineering, 2023, 49(9): 265-271, 278.
[14]	Yixiao DU, Hongjun WANG, Xiuhe LI. Research on Fingerprint Positioning Method of Radiation Source Based on Spectrum Map [J]. Computer Engineering, 2023, 49(9): 183-190, 198.
[15]	Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU. Visual SLAM Algorithm Based on Target Detection and Semantic Segmentation [J]. Computer Engineering, 2023, 49(8): 199-206, 214.

Please choose a citation manager

Content to export