作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 288-295. doi: 10.19678/j.issn.1000-3428.0063257

• 开发研究与工程应用 • 上一篇    下一篇

基于多尺度注意力机制的道路场景语义分割模型

范润泽, 刘宇红, 张荣芬, 李景玉   

  1. 贵州大学 大数据与信息工程学院, 贵阳 550025
  • 收稿日期:2021-11-17 修回日期:2022-03-06 发布日期:2022-07-04
  • 作者简介:范润泽(1996-),男,硕士研究生,主研方向为深度学习、目标检测;刘宇红,教授;张荣芬(通信作者),教授、博士;李景玉,硕士研究生。
  • 基金资助:
    贵州省科学技术基金(黔科合基础-ZK[2021]重点001)。

Road Scene Semantic Segmentation Model Based on Multi-Scale Attention Mechanism

FAN Runze, LIU Yuhong, ZHANG Rongfen, LI Jingyu   

  1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
  • Received:2021-11-17 Revised:2022-03-06 Published:2022-07-04

摘要: 通过对道路场景进行语义分割可以辅助车辆感知周边环境,达到避让行人、车辆以及各类小目标物体障碍的目的,提高行驶的安全性。针对道路场景语义分割中小目标物体识别精度不高、网络参数量过大等问题,提出一种基于多尺度注意力机制的语义分割模型。利用小波变换的多尺度多频率信息分析特性,设计一种多尺度小波注意力模块,并将其嵌入到编码器结构中,通过融合不同尺度及频率的特征信息,保留更多的边缘轮廓细节。使用编码器与解码器之间的层级连接,以及改进的金字塔池化模块进行多方面特征提取,在保留上下文特征信息的同时获得更多的图像细节。通过设计多级损失函数训练网络模型,从而加快网络收敛。在剑桥驾驶标注视频数据集上的实验结果表明,该模型的平均交并比为60.21%,与DeepLabV3+和DenseASPP模型相比参数量减少近30%,在不额外增加参数量的前提下提升了模型的分割精度,且在不同场景下均具有较好的鲁棒性。

关键词: 深度学习, 语义分割, 注意力机制, 小波变换, 金字塔池化

Abstract: Semantic segmentation of road scenes can assist vehicles to perceive the surrounding environment, to avoid pedestrians, vehicles and all kinds of small object obstacles, and further improve the safety of driving.This study proposes a semantic segmentation network based on multi-scale attention mechanism, aiming at the problems of low recognition accuracy of small objects in semantic segmentation of road scene in deep learning, and the large number of network parameters adversely affecting the deployment.A multi-scale wavelet attention module is designed based on the characteristics of wavelet transform with multi-scale and multi frequency information analysis and embedded into the encoder structure.By fusing the characteristics of different scales and frequencies, more edge contour details are retained.The hierarchical connection between the encoder and the decoder and the improved pyramid pooling module are used for feature extraction in many aspects to obtain more image details, while retaining the context feature information.By designing the training model of multistage loss function, the network convergence is accelerated.The experimental results on the Cambridge-driving Labeled Video Database(CamVid) show that the average intersection and merge ratio of the model is 60.21%, which reduces the parameters by nearly 30% compared with DeepLabV3+ and DenseASP models.The segmentation accuracy of this model is improved without additional parameters, and the model has good robustness in different scenes.

Key words: deep learning, semantic segmentation, attention mechanism, wavelet transform, pyramid pooling

中图分类号: