基于密集连接与特征增强的语义分割算法

doi:10.19678/j.issn.1000-3428.0063891

摘要/Abstract

摘要： 在语义分割算法DeepLabv3+中，由于对主干网络提取的特征信息利用不充分，导致了分割边缘不连续、目标丢失以及分割错误等问题。为此，提出一种基于密集连接和特征增强的语义分割算法。采用共享空洞空间金字塔池化（S-ASPP）模块建立多个空洞卷积之间的联系，增强局部信息之间的语义关联，捕获密集的采样点像素，同时提高对高层特征信息的利用。引入特征金字塔增强模块（FPEM）和特征融合模块（FFM），对主干网络输出的多层特征信息进行处理，增强特征的表达能力，并采用FFM对FPEM输出的不同尺度特征信息进行融合，提高各层特征之间的互补能力，以获得更全面的特征图信息。在此基础上，将S-ASPP和FFM的输出进行拼接和卷积操作，得到最终的分割结果。在PASCAL VOC 2012和Cityscapes数据集上的实验结果表明，该算法的平均交并比分别达到81.13%和73.39%，相较于基准算法DeepLabv3+分别提升了2.3和2.1个百分点，充分利用了骨干网络中的每层特征信息，提升了算法的分割精度，取得了较好的分割效果。

关键词: 语义分割, DeepLabv3+算法, 空洞空间金字塔池化, 特征金字塔增强模块, 特征融合

Abstract: In the semantic segmentation algorithm, DeepLabv3+, problems such as discontinuity, target loss, and segmentation errors exist owing to the insufficient utilization of feature information extracted by the backbone network. A semantic segmentation algorithm based on dense connection and feature enhancement is proposed to address these problems.The proposed algorithm uses the Shared-Atrous Spatial Pyramid Pooling(S-ASPP) module to establish contact between multiple atrous convolutions, enhance the semantic relationship between local information, and capture dense sampling point pixels while improving the utilization of high-level feature information. Next, the Feature Pyramid Enhancement Module(FPEM) and Feature Fusion Module(FFM) are introduced to process the multilayer feature information output by the backbone network to enhance the expression capability of the feature. The FFM is used to fuse the different scale feature information outputs from the FPEM to improve the complementary capacity between the feature layers and obtain additional comprehensive feature information.Finally, the outputs of S-ASPP and FFM are spliced and convolved to obtain the final segmentation results. Extensive experiments conducted on PASCAL VOC 2012 and Cityscapes datasets show that the proposed algorithm achieves mean Intersection over Union(mIoU) values of 81.13% and 73.39%, respectively, which are 2.3 and 2.1 percentage points higher than the benchmark algorithm, DeepLabv3+. The proposed algorithm fully utilizes each feature information layer in the backbone network, enhances the segmentation accuracy of the algorithm, and achieves enhanced segmentation.

Key words: semantic segmentation, DeepLabv3+ algorithm, Atrous Spatial Pyramid Pooling(ASPP), Feature Pyramid Enhancement Module(FPEM), feature fusion

中图分类号:

TP391.4

马素刚, 陈期梅, 侯志强, 杨小宝, 张子贤. 基于密集连接与特征增强的语义分割算法[J]. 计算机工程, 2023, 49(3): 263-270.

MA Sugang, CHEN Qimei, HOU Zhiqiang, YANG Xiaobao, ZHANG Zixian. Semantic Segmentation Algorithm Based on Dense Connection and Feature Enhancement[J]. Computer Engineering, 2023, 49(3): 263-270.

http://www.ecice06.com/CN/Y2023/V49/I3/263

图/表 11

20230314190534

20230314190537

20230314190540

20230314190545

20230314190548

20230314190551

20230314190554

20230314190558

20230314190602

20230314190605

20230314190608

参考文献

[1] 青晨, 禹晶, 肖创柏, 等.深度卷积神经网络图像语义分割研究进展[J].中国图象图形学报, 2020, 25(6):1069-1090. QING C, YU J, XIAO C B, et al.Deep convolutional neural network for semantic image segmentation[J].Journal of Image and Graphics, 2020, 25(6):1069-1090.(in Chinese)
[2] 张永亮, 陆阳, 朱芜强, 等.基于多尺度特征提取与特征融合的交通标志检测[J].计算机工程, 2022, 48(10):270-278, 287. ZHANG Y L, LU Y, ZHU W Q, et al.Traffic sign detection based on multi-scale feature extraction and feature fusion[J].Computer Engineering, 2022, 48(10):270-278, 287.(in Chinese)
[3] 黄义妨, 魏丹丹, 武淼, 等.面向不同传感器与复杂场景的人脸识别系统防伪方法综述[J].计算机工程, 2021, 47(12):1-18. HUANG Y F, WEI D D, WU M, et al.Overview of anti-spoofing methods of face recognition systems for different sensors and complex scenes[J].Computer Engineering, 2021, 47(12):1-18.(in Chinese)
[4] 田钰杰, 管有庆, 龚锐.一种鲁棒的多特征点云分类分割深度神经网络[J].计算机工程, 2021, 47(11):234-240. TIAN Y J, GUAN Y Q, GONG R.A robust deep neural network for multi-feature point cloud classification and segmentation[J].Computer Engineering, 2021, 47(11):234-240.(in Chinese)
[5] 吴玉超, 林岚, 王婧璇, 等.基于卷积神经网络的语义分割在医学图像中的应用[J].生物医学工程学杂志, 2020, 37(3):533-540. WU Y C, LIN L, WANG J X, et al.Application of semantic segmentation based on convolutional neural network in medical images[J].Journal of Biomedical Engineering, 2020, 37(3):533-540.(in Chinese)
[6] SHELHAMER E, LONG J, DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:640-651.
[7] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[8] RONNEBERGER O, FISCHER P, BROX T.U-net:convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention.Berlin, Germany:Springer, 2015:234-241.
[9] CHEN L C, PAPANDREOU G, KOKKINOS I, et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL].[2022-01-05].https://arxiv.org/abs/1412.7062.
[10] LIU S T, HUANG D, WANG Y H.Receptive field block net for accurate and fast object detection[C]//Proceedings of ECCVʼ18.Berlin, Germany:Springer, 2018:404-419.
[11] HE K M, ZHANG X Y, REN S Q, et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[12] CHEN L C, PAPANDREOU G, KOKKINOS I, et al.DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
[13] CHEN L C, PAPANDREOU G, SCHROFF F, et al.Rethinking atrous convolution for semantic image segmentation[EB/OL].[2022-01-05].https://arxiv preprint arxiv:1760.05587.
[14] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[15] HOU Q B, ZHOU D Q, FENG J S.Coordinate attention for efficient mobile network design[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA.IEEE Press, 2021:13708-13717.
[16] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803.
[17] TAKIKAWA T, ACUNA D, JAMPANI V, et al.Gated-SCNN:gated shape CNNs for semantic segmentation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2020:5228-5237.
[18] LIU S, QI L, QIN H F, et al.Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8759-8768.
[19] CHEN L C, ZHU Y K, PAPANDREOU G, et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th International Conference on Computer Vision.New York, USA:ACM Press, 2018:833-851.
[20] WANG W H, XIE E Z, LI X, et al.PAN++:towards efficient and accurate end-to-end spotting of arbitrarily-shaped text[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9):5349-5367.
[21] FU J, LIU J, TIAN H J, et al.Dual attention network for scene segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:3141-3149.
[22] TIAN Z, HE T, SHEN C H, et al.Decoders matter for semantic segmentation:data-dependent decoding enables flexible feature aggregation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:3121-3130.
[23] LI H C, XIONG P F, AN J, et al.Pyramid attention network for semantic segmentation[EB/OL].[2022-01-05].https://arxiv.org/abs/1805.10180.
[24] NIRKIN Y, WOLF L, HASSNER T.HyperSeg:patch-wise hypernetwork for real-time semantic segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:4060-4069.
[25] YU F, KOLTUN V.Multi-scale context aggregation by dilated convolutions[EB/OL].[2022-01-05].https://arxiv.org/abs/1511.07122.
[26] GHIASI G, FOWLKES C C.Laplacian pyramid reconstruction and refinement for semantic segmentation[C]//Proceedings of ECCVʼ16.Washington D.C., USA:IEEE Press, 2016:519-534.
[27] YU C Q, WANG J B, PENG C, et al.BiSeNet:bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of ECCVʼ18.Washington D.C., USA:IEEE Press, 2018:334-349.
[28] YU C, GAO C, WANG J, et al.BiSeNet V2:bilateral network with guided aggregation for real-time semantic segmentation[EB/OL].[2022-01-05].https://arxiv.org/abs/2004.02147.

选择文件类型/文献管理软件名称

选择包含的内容