作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (7): 314-325. doi: 10.19678/j.issn.1000-3428.0068674

• 图形图像处理 • 上一篇    下一篇

适用于导盲场景的多尺度特征融合轻量化道路图像分割算法

沙宇洋1, 陆京涛2, 杜浩凡3, 翟小兵1, 孟维宇1, 廉旭1, 罗刚1, 李克峰1,*()   

  1. 1. 澳门理工大学应用科学学院,澳门 999078
    2. 北京工业大学数学统计学与力学学院,北京 100124
    3. 南京师范大学物理科学与技术学院,江苏 南京 210023
  • 收稿日期:2023-10-24 出版日期:2025-07-15 发布日期:2024-05-08
  • 通讯作者: 李克峰
  • 基金资助:
    澳门科学技术发展基金(0033/2023/RIB2); 澳门理工大学基金(RP/FCA-14/2023)

Lightweight Road Image Segmentation Algorithm Based on Multi-Scale Feature Fusion for Blind Guiding Scenarios

SHA Yuyang1, LU Jingtao2, DU Haofan3, ZHAI Xiaobing1, MENG Weiyu1, LIAN Xu1, LUO Gang1, LI Kefeng1,*()   

  1. 1. Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
    2. School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing 100124, China
    3. School of Physics and Technology, Nanjing Normal University, Nanjing 210023, Jiangsu, China
  • Received:2023-10-24 Online:2025-07-15 Published:2024-05-08
  • Contact: LI Kefeng

摘要:

图像分割是环境感知中的一项关键技术,被广泛应用于无人驾驶、虚拟现实等实际任务中。随着技术的不断发展,基于计算机视觉技术的导盲系统日趋成熟,并且在精度、稳定性等方面优于传统的解决方案。在视觉导盲系统中,道路图像的语义分割是非常重要的一部分,系统通过分析算法的输出结果可以获取目前所处的环境状态,从而指导用户躲避前方障碍物,获取最优的移动路径。视觉导盲系统的使用环境复杂,对模型的运行效率和分割精度具有极高的要求。然而,常用的高精度语义分割算法参数量大、运行速度慢,因此无法直接应用于导盲系统。针对这一问题,提出了一种基于多尺度特征的轻量化道路图像分割算法。模型含有两个特征提取分支,即Detail Branch和Semantic Branch,其中Detail Branch用来提取图像的低层细节信息,Semantic Branch用来提取图像的高层语义信息,并且两个分支中的多尺度特征也会被所设计的特征映射模块处理和使用,进而提升模型对于特征的建模能力。此外,设计了一种简单且高效的特征融合模块,通过融合不同尺度的特征,增强模型对于上下文信息的编码能力。采集和标注了适用于导盲场景的大量道路分割数据,并制作成了相应的数据集。基于该数据集对所提出的算法进行训练和测试,实验结果显示: 所提出的道路分割算法的平均交并比(mIoU)为96.5%,优于现有的图像分割模型;以1 024×1 024像素的图像作为输入,所提算法的轻量化版本在NVIDIA GTX 3090Ti平台的运行速度为201帧/s,优于现有轻量化图像分割模型;将模型部署到NVIDIA AGX Xavier设备中,其在实际场景中的测试速度为53帧/s,满足实际需求。

关键词: 道路分割, 多尺度模型, 视觉导盲系统, 深度学习, 特征融合, 场景理解

Abstract:

Image segmentation is a crucial technology for environmental perception, and it is widely used in various scenarios such as autonomous driving and virtual reality. With the rapid development of technology, computer vision-based blind guiding systems are attracting increasing attention as they outperform traditional solutions in terms of accuracy and stability. The semantic segmentation of road images is an essential feature of a visual guiding system. By analyzing the output of algorithms, the guiding system can understand the current environment and aid blind people in safe navigation, which helps them avoid obstacles, move efficiently, and get the optimal moving path. Visual blind guiding systems are often used in complex environments, which require high running efficiency and segmentation accuracy. However, commonly used high-precision semantic segmentation algorithms are unsuitable for use in blind guiding systems owing to their low running speed and a large number of model parameters. To solve this problem, this paper proposes a lightweight road image segmentation algorithm based on multiscale features. Unlike existing methods, the proposed model contains two feature extraction branches, namely, the Detail Branch and Semantic Branch. The Detail Branch extracts low-level detail information from the image, while the Semantic Branch extracts high-level semantic information. Multiscale features from the two branches are processed and used by the designed feature mapping module, which can further improve the feature modeling performance. Subsequently, a simple and efficient feature fusion module is designed for the fusion of features with different scales to enhance the ability of the model in terms of encoding contextual information by fusing multiscale features. A large amount of road segmentation data suitable for blind guiding scenarios are collected and labeled, and a corresponding dataset is generated. The model is trained and tested on the dataset. The experimental results show that the mean Intersection over Union (mIoU) of the proposed method is 96.5%, which is better than that of existing image segmentation models. The proposed model can achieve a running speed of 201 frames per second on NVIDIA GTX 3090Ti, which is higher than that of existing lightweight image segmentation models. The model can be deployed on NVIDIA AGX Xavier to obtain a running speed of 53 frames per second, which can meet the requirements for practical applications.

Key words: road segmentation, multi-scale model, visual blind guiding system, deep learning, feature fusion, scene understanding