作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 199-206, 214. doi: 10.19678/j.issn.1000-3428.0065522

• 图形图像处理 • 上一篇    下一篇

基于目标检测和语义分割的视觉SLAM算法

徐春波, 闫娟, 杨慧斌, 王博, 吴晗   

  1. 上海工程技术大学 机械与汽车工程学院, 上海 201620
  • 收稿日期:2022-08-16 出版日期:2023-08-15 发布日期:2023-08-16
  • 作者简介:

    徐春波(1997—),男,硕士研究生,主研方向为计算机视觉、智能控制

    闫娟,副教授、硕士

    杨慧斌,实验师、硕士

    王博,硕士研究生

    吴晗,硕士研究生

  • 基金资助:
    上海市企事业单位委托项目((19)JQ-009)

Visual SLAM Algorithm Based on Target Detection and Semantic Segmentation

Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU   

  1. School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Received:2022-08-16 Online:2023-08-15 Published:2023-08-16

摘要:

目前多数视觉同步定位与建图(VSLAM)算法基于静态场景设计且未考虑场景中的动态物体,然而现实场景中存在的动态物体会造成视觉里程计的特征点误匹配,影响VSLAM系统定位与建图精度,降低其在实际应用中鲁棒性。针对室内动态环境,提出一种基于ORB-SLAM3主体框架的VSLAM算法(RDTS-SLAM)。利用改进的YOLOv5目标检测与语义分割网络对环境中的物体进行精准快速分割,同时将目标检测结果与局部光流法相结合准确识别动态物体,并对动态物体区域内的特征点进行剔除,仅使用静态特征点进行特征点匹配以及后续的定位与建图。在TUM RGB数据集和真实环境数据上的实验结果表明,相较于ORB-SLAM3和RDS-SLAM算法,RDTS-SLAM算法对于walking_rpy序列的轨迹估计均方根误差分别降低了95.38%和86.20%,可以显著提高VSLAM系统在动态环境中的鲁棒性和准确性。

关键词: 视觉同步定位与建图, 目标检测, 语义分割, YOLOv5网络, 局部光流法

Abstract:

Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

Key words: Visual Simultaneous Localization And Mapping(VSLAM), target detection, semantic segmentation, YOLOv5 network, local optical flow method