作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 242-249. doi: 10.19678/j.issn.1000-3428.0067370

• 图形图像处理 • 上一篇    下一篇

动态场景下基于语义分割的视觉SLAM方法

杜晓英1, 袁庆霓1,2,3,*(), 齐建友1, 王晨1, 杜飞龙1, 任澳1   

  1. 1. 贵州大学现代制造技术教育部重点实验室, 贵州 贵阳 550025
    2. 贵州大学机械工程学院, 贵州 贵阳 550025
    3. 贵州大学省部共建公共大数据国家重点实验室, 贵州 贵阳 550025
  • 收稿日期:2023-04-06 出版日期:2024-03-15 发布日期:2023-06-16
  • 通讯作者: 袁庆霓
  • 基金资助:
    国家自然科学基金(52165063); 国家自然科学基金(52065010); 贵州省科技厅资助项目([2022]重点024); 贵州省科技厅资助项目([2022]一般140); 贵州省科技厅资助项目([2023]一般094); 贵州省科技厅资助项目([2023]一般025); 贵州大学实验室开放资助项目(SYSKF2023-089)

Visual SLAM Method Based on Semantic Segmentation in Dynamic Scenes

Xiaoying DU1, Qingni YUAN1,2,3,*(), Jianyou QI1, Chen WANG1, Feilong DU1, Ao REN1   

  1. 1. Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University, Guiyang 550025, Guizhou, China
    2. School of Mechanical Engineering, Guizhou University, Guiyang 550025, Guizhou, China
    3. State Key Laboratory of Public Big Data Jointly Built by Provincial and Ministerial Governments, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2023-04-06 Online:2024-03-15 Published:2023-06-16
  • Contact: Qingni YUAN

摘要:

针对在动态场景下视觉同步定位与建图(SLAM)鲁棒性差、定位与建图精度易受动态物体干扰的问题,设计一种基于改进DeepLabv3plus与多视图几何的语义视觉SLAM算法。以语义分割网络DeepLabv3plus为基础,采用轻量级卷积网络MobileNetV2进行特征提取,并使用深度可分离卷积代替空洞空间金字塔池化模块中的标准卷积,同时引入注意力机制,提出改进的语义分割网络DeepLabv3plus。将改进后的语义分割网络DeepLabv3plus与多视图几何结合,提出动态点检测方法,以提高视觉SLAM在动态场景下的鲁棒性。在此基础上,构建包含语义信息和几何信息的三维语义静态地图。在TUM数据集上的实验结果表明,与ORB-SLAM2相比,该算法在高动态序列下的绝对轨迹误差的均方根误差值和标准差(SD)值最高分别提升98%和97%。

关键词: DeepLabv3plus网络, 视觉同步定位与建图, 多视图几何, 动态场景, 语义地图

Abstract:

A semantic visual SLAM algorithm based on an improved semantic segmentation network DeepLabv3plus and multiview geometry is designed to address the issues of poor robustness and susceptibility to interference from dynamic objects in visual Synchronous Localization And Map (SLAM) construction in dynamic scenes. Based on the semantic segmentation network DeepLabv3plus, a lightweight convolutional network MobileNetV2 is used for feature extraction, and depthwise separable convolutions are used instead of standard convolutions in the Atrous Spatial Pyramid Pooling (ASPP) module. Simultaneously, an attention mechanism is introduced to propose an improved semantic segmentation network DeepLabv3plus. Combining the improved semantic segmentation network DeepLabv3plus with multiview geometry, a dynamic point detection method is proposed to enhance the robustness of visual SLAM in dynamic scenes. On this basis, a three-dimensional semantic static map containing both semantic and geometric information is constructed. The experimental results on the TUM dataset demonstrate that compared with ORB-SLAM2, the highest Root Mean Square Error (RMSE) and Standard Deviation (SD) values increased by more than 98% and 97%, respectively.

Key words: DeepLabv3plus network, visual Synchronous Localization And Map (SLAM), multiview geometry, dynamic scenes, semantic map