适用于导盲场景的多尺度特征融合轻量化道路图像分割算法

doi:10.19678/j.issn.1000-3428.0068674

摘要/Abstract

摘要：

图像分割是环境感知中的一项关键技术，被广泛应用于无人驾驶、虚拟现实等实际任务中。随着技术的不断发展，基于计算机视觉技术的导盲系统日趋成熟，并且在精度、稳定性等方面优于传统的解决方案。在视觉导盲系统中，道路图像的语义分割是非常重要的一部分，系统通过分析算法的输出结果可以获取目前所处的环境状态，从而指导用户躲避前方障碍物，获取最优的移动路径。视觉导盲系统的使用环境复杂，对模型的运行效率和分割精度具有极高的要求。然而，常用的高精度语义分割算法参数量大、运行速度慢，因此无法直接应用于导盲系统。针对这一问题，提出了一种基于多尺度特征的轻量化道路图像分割算法。模型含有两个特征提取分支，即Detail Branch和Semantic Branch，其中Detail Branch用来提取图像的低层细节信息，Semantic Branch用来提取图像的高层语义信息，并且两个分支中的多尺度特征也会被所设计的特征映射模块处理和使用，进而提升模型对于特征的建模能力。此外，设计了一种简单且高效的特征融合模块，通过融合不同尺度的特征，增强模型对于上下文信息的编码能力。采集和标注了适用于导盲场景的大量道路分割数据，并制作成了相应的数据集。基于该数据集对所提出的算法进行训练和测试，实验结果显示: 所提出的道路分割算法的平均交并比(mIoU)为96.5%，优于现有的图像分割模型；以1 024×1 024像素的图像作为输入，所提算法的轻量化版本在NVIDIA GTX 3090Ti平台的运行速度为201帧/s，优于现有轻量化图像分割模型；将模型部署到NVIDIA AGX Xavier设备中，其在实际场景中的测试速度为53帧/s，满足实际需求。

关键词: 道路分割, 多尺度模型, 视觉导盲系统, 深度学习, 特征融合, 场景理解

Abstract:

Image segmentation is a crucial technology for environmental perception, and it is widely used in various scenarios such as autonomous driving and virtual reality. With the rapid development of technology, computer vision-based blind guiding systems are attracting increasing attention as they outperform traditional solutions in terms of accuracy and stability. The semantic segmentation of road images is an essential feature of a visual guiding system. By analyzing the output of algorithms, the guiding system can understand the current environment and aid blind people in safe navigation, which helps them avoid obstacles, move efficiently, and get the optimal moving path. Visual blind guiding systems are often used in complex environments, which require high running efficiency and segmentation accuracy. However, commonly used high-precision semantic segmentation algorithms are unsuitable for use in blind guiding systems owing to their low running speed and a large number of model parameters. To solve this problem, this paper proposes a lightweight road image segmentation algorithm based on multiscale features. Unlike existing methods, the proposed model contains two feature extraction branches, namely, the Detail Branch and Semantic Branch. The Detail Branch extracts low-level detail information from the image, while the Semantic Branch extracts high-level semantic information. Multiscale features from the two branches are processed and used by the designed feature mapping module, which can further improve the feature modeling performance. Subsequently, a simple and efficient feature fusion module is designed for the fusion of features with different scales to enhance the ability of the model in terms of encoding contextual information by fusing multiscale features. A large amount of road segmentation data suitable for blind guiding scenarios are collected and labeled, and a corresponding dataset is generated. The model is trained and tested on the dataset. The experimental results show that the mean Intersection over Union (mIoU) of the proposed method is 96.5%, which is better than that of existing image segmentation models. The proposed model can achieve a running speed of 201 frames per second on NVIDIA GTX 3090Ti, which is higher than that of existing lightweight image segmentation models. The model can be deployed on NVIDIA AGX Xavier to obtain a running speed of 53 frames per second, which can meet the requirements for practical applications.

Key words: road segmentation, multi-scale model, visual blind guiding system, deep learning, feature fusion, scene understanding

沙宇洋, 陆京涛, 杜浩凡, 翟小兵, 孟维宇, 廉旭, 罗刚, 李克峰. 适用于导盲场景的多尺度特征融合轻量化道路图像分割算法[J]. 计算机工程, 2025, 51(7): 314-325.

SHA Yuyang, LU Jingtao, DU Haofan, ZHAI Xiaobing, MENG Weiyu, LIAN Xu, LUO Gang, LI Kefeng. Lightweight Road Image Segmentation Algorithm Based on Multi-Scale Feature Fusion for Blind Guiding Scenarios[J]. Computer Engineering, 2025, 51(7): 314-325.

https://www.ecice06.com/CN/Y2025/V51/I7/314

图/表 14

图1 所提方法的示意图

Fig.1 Diagram of proposed method

图2 Stem Block的具体结构

Fig.2 Details of Stem Block

图3 FPM结构

Fig.3 Architecture of FPM

图4 MSFF结构

Fig.4 MSFF structure

图5 所收集的道路分割数据集展示

Fig.5 Visualization of collected road segmentation dataset

图6 算法分割结果

Fig.6 Segmentation results of algorithms

图7 算法在Cityscapes上的部分可视化结果

Fig.7 Some visualization results of different methods on Cityscapes

图8 算法在CamVid中的部分可视化结果

Fig.8 Some visualization results of different methods on CamVid

参考文献 43

1	WANG H C, KATZSCHMANN R K, TENG S, et al. Enabling independent navigation for visually impaired people through a wearable vision-based feedback system[C]//Proceedings of 2017 IEEE International Conference on Robotics and Automation (ICRA). Washington D. C., USA: IEEE Press, 2017: 6533-6540.
2	范润泽, 刘宇红, 张荣芬, 等. 基于多尺度注意力机制的道路场景语义分割模型. 计算机工程, 2023, 49 (2): 288- 294. doi: 10.19678/j.issn.1000-3428.0063257
	FAN R Z , LIU Y H , ZHANG R F , et al. Road scene semantic segmentation model based on multi-scale attention mechanism. Computer Engineering, 2023, 49 (2): 288- 294. doi: 10.19678/j.issn.1000-3428.0063257
3	刘晓蓉, 李晓霞, 秦昌辉. 融合多尺度对比池化特征的行人重识别方法. 计算机工程, 2022, 48 (4): 292- 298. doi: 10.19678/j.issn.1000-3428.0061508
	LIU X R , LI X X , QIN C H . Pedestrian re-identification method with multi-scale contrast pooling feature. Computer Engineering, 2022, 48 (4): 292- 298. doi: 10.19678/j.issn.1000-3428.0061508
4	李柯泉, 陈燕, 刘佳晨, 等. 基于深度学习的目标检测算法综述. 计算机工程, 2022, 48 (7): 10- 19. doi: 10.19678/j.issn.1000-3428.0062725
	LI K Q , CHEN Y , LIU J C , et al. Survey of deep learning-based object detection algorithms. Computer Engineering, 2022, 48 (7): 10- 19. doi: 10.19678/j.issn.1000-3428.0062725
5	BADRINARAYANAN V , KENDALL A , CIPOLLA R . SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495.
6	XIE E, WANG W, YU Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[EB/OL]. [2024-03-10]. https://arxiv.org/abs/2105.15203.
7	XU J, DE MELLO S, LIU S, et al. GroupViT: semantic segmentation emerges from text supervision[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 18134-18144.
8	SHA Y , MENG W , LUO G , et al. MetDIT: transforming and analyzing clinical metabolomics data with convolutional neural networks. Analytical Chemistry, 2024, 96 (7): 2949- 2957.
9	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
10	WANG J , SUN K , CHENG T , et al. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (10): 3349- 3364.
11	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2024-03-10]. https://dblp.org/rec/conf/iclr/DosovitskiyB0WZ21.html.
12	YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin, Germany: Springer, 2018: 325-341.
13	XU J, XIONG Z, BHATTACHARYYA S P. PIDNet: a real-time semantic segmentation network inspired by PID controllers[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 19529-19539.
14	GUO M H, LU C Z, HOU Q, et al. SegNeXt: rethinking convolutional attention design for semantic segmentation[EB/OL]. (2022-09-18)[2024-03-10]. https://arxiv.org/abs/2209.08575.
15	CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 3213-3223.
16	BROSTOW G J , FAUQUEUR J , CIPOLLA R . Semantic object classes in video: a high-definition ground truth database. Pattern Recognition Letters, 2009, 30 (2): 88- 97.
17	KAYUKAWA S, ISHIHARA T, TAKAGI H, et al. BlindPilot: a robotic local navigation system that leads blind people to a landmark object[C]//Proceedings of CHI EA'20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. New York, USA: ACM, 2020: 1-9.
18	KAYUKAWA S, HIGUCHI K, GUERREIRO J, et al. BBeep: a sonic collision avoidance system for blind travellers and nearby pedestrians[C]//Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. New York, USA: ACM, 2019: 1-12.
19	SIAGIAN C, CHANG C K, ITTI L, et al. Mobile robot navigation system in outdoor pedestrian environment using vision-based road recognition[C]//Proceedings of 2013 IEEE International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 2013: 564-571.
20	ZHANG J , YANG K , CONSTANTINESCU A , et al. Trans4Trans: efficient transformer for transparent object and semantic scene segmentation in real-world navigation assistance. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (10): 19173- 19186.
21	YANG K , WANG K , BERGASA L M , et al. Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sensors, 2018, 18 (5): 1506.
22	AL-HALAH Z, RAMAKRISHNAN S K, GRAUMAN K. Zero experience required: plug & play modular transfer learning for semantic visual navigation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 17031-17041.
23	QIAO Y, QI Y, HONG Y, et al. HOP: history-and-order aware pre-training for vision-and-language navigation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 15418-15427.
24	CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin, Germany: Springer, 2018: 801-818.
25	ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2881-2890.
26	ZHENG S, LU J, ZHAO H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6877-6886.
27	LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 10012-10022.
28	CHENG B, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 1290-1299.
29	CAO H, WANG Y, CHEN J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation[EB/OL]. (2021-04-12)[2024-03-10]. https://arxiv.org/abs/2105.05537.
30	SHA Y, MENG W, ZHAI X, et al. Accurate facial landmark detector via multi-scale transformer[C]//Proceedings of Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Berlin, Germany: Springer, 2023: 278-290.
31	PASZKE A, CHAURASIA A, KIM S, et al. ENet: a deep neural network architecture for real-time semantic segmentation[EB/OL]. (2016-06-07)[2024-03-10]. https://arxiv.org/abs/1606.02147.
32	MEHTA S , HAJISHIRZI H , RASTEGARI M . DiCENet: dimension-wise convolutions for efficient networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44 (5): 2416- 2425.
33	LI H, XIONG P, FAN H, et al. DFANet: deep feature aggregation for real-time semantic segmentation[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 9522-9531.
34	CHEN W, GONG X, LIU X, et al. FasterSeg: searching for faster real-time semantic segmentation[EB/OL]. (2020-02-16)[2024-03-10]. https://arxiv.org/abs/1912.10917.
35	SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 4510-4520.
36	ZHAO H, QI X, SHEN X, et al. ICNet for real-time semantic segmentation on high-resolution images[C]// Proceedings of the European Conference on Computer Vision (ECCV). Berlin, Germany: Springer, 2018: 405-420.
37	MEHTA S, RASTEGARI M, SHAPIRO L, et al. ESPNetV2: a light-weight, power efficient, and general purpose convolutional neural network[EB/OL]. (2019-03-30)[2024-03-10]. https://arxiv.org/abs/1811.11431.
38	YU C , GAO C , WANG J , et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068.
39	WANG J, GOU C, WU Q, et al. RTFormer: efficient design for real-time semantic segmentation with transformer[EB/OL]. (2022-10-13)[2024-03-10]. https://arxiv.org/abs/2210.07124.
40	WANG J, LONG X, CHEN G, et al. U-HRNet: delving into improving semantic representation of high resolution network for dense prediction[EB/OL]. (2022-10-13)[2024-03-10]. https://arxiv.org/abs/2210.07140.
41	TANG S, SUN T, PENG J, et al. PP-MobileSeg: explore the fast and accurate semantic segmentation model on mobile devices[EB/OL]. (2023-04-11)[2024-03-10]. https://arxiv.org/abs/2304.05152.
42	SHA Y. Efficient facial landmark detector by knowledge distillation[C]//Proceedings of 2021 IEEE International Conference on Automatic Face and Gesture Recognition (FG). Washington D. C., USA: IEEE Press, 2023: 1-8.
43	SHA Y , ZHAI X , LI J , et al. A novel lightweight deep learning fall detection system based on Global-Local attention and channel feature augmentation. Interdisciplinary Nursing Research, 2023, 2, 68- 75.

[1]	刘春霞, 孟吉星, 潘理虎, 龚大立. 融合RGB与IR图像的遥感小目标检测方法[J]. 计算机工程, 2025, 51(7): 326-338.
[2]	栾孟娜, 郑秋梅, 王风华. 基于DMC-YOLO的交通标志实时检测算法[J]. 计算机工程, 2025, 51(7): 90-99.
[3]	欧阳昱中, 韩锐, 刘驰. 边缘侧领域自适应中长尾视觉识别技术研究[J]. 计算机工程, 2025, 51(7): 171-179.
[4]	孟波, 史旭华, 张彬. 基于双分支卷积和深度插值的点云表面重建[J]. 计算机工程, 2025, 51(7): 119-126.
[5]	周莎, 车生兵, 考友琛, 张旭, 郭甚驿. 基于特征选择和时空特征的网络入侵检测[J]. 计算机工程, 2025, 51(7): 223-231.
[6]	李姜辛, 王鹏, 汪卫. 多机理指导的深度学习工业时序预测框架[J]. 计算机工程, 2025, 51(7): 47-58.
[7]	周哲臣, 胡冀苏, 钱旭升, 郑毅, 戴亚康, 周志勇. 基于查询自适应双层自注意力机制的MRI脑组织分割[J]. 计算机工程, 2025, 51(7): 294-304.
[8]	余鹏, 杨佳琦, 陈欣然, 贺超波. 基于二部图对比学习的特征增强推荐算法[J]. 计算机工程, 2025, 51(7): 100-110.
[9]	秦永旺, 张洋, 胡星, 刘胜, 李少青. 基于图注意力网络的门级网表功能识别[J]. 计算机工程, 2025, 51(6): 29-37.
[10]	廖丁丁, 刘俊峰, 曾君, 邱晓欢. 一种基于块平均正交权重修正的连续学习算法[J]. 计算机工程, 2025, 51(6): 57-64.
[11]	庞鑫, 葛凤培, 李艳玲. 声景识音：数字化时代声学场景分类的探索与前沿[J]. 计算机工程, 2025, 51(6): 1-19.
[12]	刘凯, 任洪逸, 李蓥, 季怡, 刘纯平. 基于交叉模态注意力特征增强的医学视觉问答[J]. 计算机工程, 2025, 51(6): 49-56.
[13]	李毅, 徐慧英, 朱信忠, 黄晓, 王舒梦, 李悉钰. 基于YOLOv5n模型改进的口罩检测算法: Mask-YOLO[J]. 计算机工程, 2025, 51(6): 297-310.
[14]	陈思帆, 杨家志, 黄琳, 吕志玮, 沈露. 融合可变形核和自注意力的点云分类分割边卷积网络[J]. 计算机工程, 2025, 51(6): 146-154.
[15]	王培吉, 邹承明. 基于向量转换的卷积计算优化方法[J]. 计算机工程, 2025, 51(6): 74-82.

选择文件类型/文献管理软件名称

选择包含的内容