基于多尺度注意力机制的道路场景语义分割模型

doi:10.19678/j.issn.1000-3428.0063257

摘要/Abstract

摘要： 通过对道路场景进行语义分割可以辅助车辆感知周边环境，达到避让行人、车辆以及各类小目标物体障碍的目的，提高行驶的安全性。针对道路场景语义分割中小目标物体识别精度不高、网络参数量过大等问题，提出一种基于多尺度注意力机制的语义分割模型。利用小波变换的多尺度多频率信息分析特性，设计一种多尺度小波注意力模块，并将其嵌入到编码器结构中，通过融合不同尺度及频率的特征信息，保留更多的边缘轮廓细节。使用编码器与解码器之间的层级连接，以及改进的金字塔池化模块进行多方面特征提取，在保留上下文特征信息的同时获得更多的图像细节。通过设计多级损失函数训练网络模型，从而加快网络收敛。在剑桥驾驶标注视频数据集上的实验结果表明，该模型的平均交并比为60.21%，与DeepLabV3+和DenseASPP模型相比参数量减少近30%，在不额外增加参数量的前提下提升了模型的分割精度，且在不同场景下均具有较好的鲁棒性。

关键词: 深度学习, 语义分割, 注意力机制, 小波变换, 金字塔池化

Abstract: Semantic segmentation of road scenes can assist vehicles to perceive the surrounding environment, to avoid pedestrians, vehicles and all kinds of small object obstacles, and further improve the safety of driving.This study proposes a semantic segmentation network based on multi-scale attention mechanism, aiming at the problems of low recognition accuracy of small objects in semantic segmentation of road scene in deep learning, and the large number of network parameters adversely affecting the deployment.A multi-scale wavelet attention module is designed based on the characteristics of wavelet transform with multi-scale and multi frequency information analysis and embedded into the encoder structure.By fusing the characteristics of different scales and frequencies, more edge contour details are retained.The hierarchical connection between the encoder and the decoder and the improved pyramid pooling module are used for feature extraction in many aspects to obtain more image details, while retaining the context feature information.By designing the training model of multistage loss function, the network convergence is accelerated.The experimental results on the Cambridge-driving Labeled Video Database(CamVid) show that the average intersection and merge ratio of the model is 60.21%, which reduces the parameters by nearly 30% compared with DeepLabV3+ and DenseASP models.The segmentation accuracy of this model is improved without additional parameters, and the model has good robustness in different scenes.

Key words: deep learning, semantic segmentation, attention mechanism, wavelet transform, pyramid pooling

中图分类号:

TP393

范润泽, 刘宇红, 张荣芬, 李景玉. 基于多尺度注意力机制的道路场景语义分割模型[J]. 计算机工程, 2023, 49(2): 288-295.

FAN Runze, LIU Yuhong, ZHANG Rongfen, LI Jingyu. Road Scene Semantic Segmentation Model Based on Multi-Scale Attention Mechanism[J]. Computer Engineering, 2023, 49(2): 288-295.

https://www.ecice06.com/CN/Y2023/V49/I2/288

图/表 8

20230216182834

20230216182844

20230216182848

20230216182912

20230216182915

20230216182918

20230216182922

20230216182925

参考文献

[1] 音松, 陈雪云, 贝学宇.改进Mask RCNN算法及其在行人实例分割中的应用[J].计算机工程, 2021, 47(6):271-276, 283. YIN S, CHEN X Y, BEI X Y.Improved mask RCNN algorithm and its application in pedestrian instance segmentation[J].Computer Engineering, 2021, 47(6):271-276, 283.(in Chinese)
[2] KHAN M Z, GAJENDRAN M K, LEE Y, et al.Deep neural architectures for medical image semantic segmentation:review[J].IEEE Access, 9:83002-83024.
[3] 张艳, 杜会娟, 孙叶美, 等.基于改进SSD算法的遥感图像目标检测[J].计算机工程, 2021, 47(9):252-258, 265. ZHANG Y, DU H J, SUN Y M, et al.Object detection in remote sensing images based on improved SSD algorithm[J].Computer Engineering, 2021, 47(9):252-258, 265.(in Chinese)
[4] LONG J, SHELHAMER E, DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:3431-3440.
[5] PASZKE A, CHAURASIA A, KIM S, et al.Enet:a deep neural network architecture for real-time semantic segmentation[EB/OL].[2021-10-10].https://arxiv.org/abs/1606.02147.
[6] RONNEBERGER O, FISCHER P, BROX T.U-net:convolutional networks for biomedical image segmentation[C]//Proceedings of Conference on Medical Image Computing and Computer-Assisted Intervention.Washington D.C., USA:IEEE Press, 2015:234-241.
[7] CHEN L C, ZHU Y K, PAPANDREOU G, et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:833-851.
[8] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[9] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[10] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803.
[11] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[12] ZHU X Z, CHENG D Z, ZHANG Z, et al.An empirical study of spatial attention mechanisms in deep networks[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:6687-6696.
[13] FU J, LIU J, TIAN H J, et al.Dual attention network for scene segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3141-3149.
[14] ZHAO T, WU X Q.Pyramid feature attention network for saliency detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3080-3089.
[15] BAE W, YOO J, YE J C.Beyond deep residual learning for image restoration:persistent homology-guided manifold simplification[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops.Washington D.C., USA:IEEE Press, 2017:1141-1149.
[16] GUO T T, MOUSAVI H S, VU T H, et al.Deep wavelet prediction for image super-resolution[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops.Washington D.C., USA:IEEE Press, 2017:1100-1109.
[17] LI Q F, SHEN L L, GUO S, et al.Wavelet integrated CNNs for noise-robust image classification[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:7243-7252.
[18] RAMAMONJISOA M, FIRMAN M, WATSON J, et al.Single image depth prediction with wavelet decomposition[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:11084-11093.
[19] MALLAT S G.A theory for multiresolution signal decomposition:the wavelet representation[EB/OL].[2021-10-10].https://www.degruyter.com/document/doi/10.1515/9781400827268.494/html.
[20] QIN Z Q, ZHANG P Y, WU F, et al.FcaNet:frequency channel attention networks[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:763-772.
[21] HOWARD A G, ZHU M L, CHEN B, et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2021-10-10].https://arxiv.org/abs/1704.04861.
[22] BROSTOW G J, SHOTTON J, FAUQUEUR J, et al.Segmentation and recognition using structure from motion point clouds[C]//Proceedings of Lecture Notes in Computer Science.Berlin, Germany:Springer, 2008:44-57.
[23] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[24] WOO S, PARK J, LEE J Y, et al.CBAM:convolutional block attention module[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:3-19.
[25] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[26] YANG M K, YU K, ZHANG C, et al.DenseASPP for semantic segmentation in street scenes[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3684-3692.
[27] 曹炬, 陈钢, 李艳姣.多策略粒子群优化算法[J].计算机工程与科学, 2014, 36(9):1716-1721. CAO J, CHEN G, LI Y J.Multi-strategy particle swarm optimization algorithm[J].Computer Engineering and Science, 2014, 36(9):1716-1721.(in Chinese)
[28] LIN G S, MILAN A, SHEN C H, et al.RefineNet:multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5168-5177.
[29] CHEN L C, ZHU Y K, PAPANDREOU G, et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:801-818.

选择文件类型/文献管理软件名称

选择包含的内容