基于目标检测和语义分割的视觉SLAM算法

doi:10.19678/j.issn.1000-3428.0065522

摘要/Abstract

摘要：

目前多数视觉同步定位与建图（VSLAM）算法基于静态场景设计且未考虑场景中的动态物体，然而现实场景中存在的动态物体会造成视觉里程计的特征点误匹配，影响VSLAM系统定位与建图精度，降低其在实际应用中鲁棒性。针对室内动态环境，提出一种基于ORB-SLAM3主体框架的VSLAM算法(RDTS-SLAM)。利用改进的YOLOv5目标检测与语义分割网络对环境中的物体进行精准快速分割，同时将目标检测结果与局部光流法相结合准确识别动态物体，并对动态物体区域内的特征点进行剔除，仅使用静态特征点进行特征点匹配以及后续的定位与建图。在TUM RGB数据集和真实环境数据上的实验结果表明，相较于ORB-SLAM3和RDS-SLAM算法，RDTS-SLAM算法对于walking_rpy序列的轨迹估计均方根误差分别降低了95.38%和86.20%，可以显著提高VSLAM系统在动态环境中的鲁棒性和准确性。

关键词: 视觉同步定位与建图, 目标检测, 语义分割, YOLOv5网络, 局部光流法

Abstract:

Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

Key words: Visual Simultaneous Localization And Mapping(VSLAM), target detection, semantic segmentation, YOLOv5 network, local optical flow method

徐春波, 闫娟, 杨慧斌, 王博, 吴晗. 基于目标检测和语义分割的视觉SLAM算法[J]. 计算机工程, 2023, 49(8): 199-206, 214.

Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU. Visual SLAM Algorithm Based on Target Detection and Semantic Segmentation[J]. Computer Engineering, 2023, 49(8): 199-206, 214.

https://www.ecice06.com/CN/Y2023/V49/I8/199

图/表 11

参考文献 25

1	TAKETOMI T, UCHIYAMA H, IKEDA S. Visual SLAM algorithms: a survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, 2017, 9(1): 1- 11. doi: 10.1186/s41074-016-0012-1
2	伍子嘉, 陈航, 彭勇, 等. 动态环境下融合轻量级YOLOv5s的视觉SLAM. 计算机工程, 2022, 48(8): 187-195, 205 doi: 10.19678/j.issn.1000-3428.0062294
	WU Z J, CHEN H, PENG Y, et al. Visual SLAM with lightweight YOLOv5s in dynamic environment. Computer Engineering, 2022, 48(8): 187-195, 205 doi: 10.19678/j.issn.1000-3428.0062294
3	ENGEL J, SCHÖPS T, CREMERS D. LSD-SLAM: large-scale direct monocular SLAM[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 834-849.
4	ENGEL J, KOLTUN V, CREMERS D. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 611- 625. doi: 10.1109/TPAMI.2017.2658577
5	MUR-ARTAL R, MONTIEL J M M, TARDÓS J D. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 2015, 31(5): 1147- 1163. doi: 10.1109/TRO.2015.2463671
6	KLEIN G, MURRAY D. Parallel tracking and mapping for small AR workspaces[C]//Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. Washington D. C., USA: IEEE Press, 2008: 225-234.
7	马科伟, 张锲石, 康宇航, 等. 移动机器人中视觉里程计技术综述. 计算机工程, 2021, 47(11): 1- 10. doi: 10.19678/j.issn.1000-3428.0060829
	MA K W, ZHANG Q S, KANG Y H, et al. Overview of visual odometry technology in mobile robots. Computer Engineering, 2021, 47(11): 1- 10. doi: 10.19678/j.issn.1000-3428.0060829
8	陶交, 范馨月, 周非. 点线特征融合的双目视觉SLAM算法. 小型微型计算机系统, 2022, 43(6): 1191- 1196. doi: 10.20009/j.cnki.21-1106/TP.2020-1061
	TAO J, FAN X Y, ZHOU F. Point-line feature fusion in stereo visual SLAM algorithm. Journal of Chinese Computer Systems, 2022, 43(6): 1191- 1196. doi: 10.20009/j.cnki.21-1106/TP.2020-1061
9	FORSTER C, PIZZOLI M, SCARAMUZZA D. SVO: fast semi-direct monocular visual odometry[C]//Proceedings of IEEE International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 2014: 15-22.
10	FORSTER C, ZHANG Z C, GASSNER M, et al. SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 2017, 33(2): 249- 265. doi: 10.1109/TRO.2016.2623335
11	CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM. IEEE Transactions on Robotics, 2021, 37(6): 1874- 1890. doi: 10.1109/TRO.2021.3075644
12	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]//Proceedings of International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2012: 2564-2571.
13	YU C, LIU Z X, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington D. C., USA: IEEE Press, 2019: 1168-1174.
14	BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters, 2018, 3(4): 4076- 4083. doi: 10.1109/LRA.2018.2860039
15	AI Y B, RUI T, LU M, et al. DDL-SLAM: a robust RGB-D SLAM in dynamic environments combined with deep learning. IEEE Access, 2020, 8, 162335- 162342. doi: 10.1109/ACCESS.2020.2991441
16	ZHAO X, ZUO T, HU X Y. OFM-SLAM: a visual semantic SLAM for dynamic indoor environments. Mathematical Problems in Engineering, 2021, 12(6): 45- 56.
17	SONG H C, KNAG M S, KIMG T E. Object detection based on mask R-CNN from infrared camera. Journal of Digital Contents Society, 2018, 19(6): 1213- 1218. doi: 10.9728/dcs.2018.19.6.1213
18	WADHWA L, MUKHERJEE S. Learnable spatiotemporal feature pyramid for prediction of future optical flow in videos. Machine Vision and Applications, 2020, 32(1): 298- 311.
19	LIU Y B, MIURA J. RDS-SLAM: real-time dynamic SLAM using semantic segmentation methods. IEEE Access, 2021, 9, 23772- 23785. doi: 10.1109/ACCESS.2021.3050617
20	冯一博, 张小俊, 王金刚. 适用于室内动态场景的视觉SLAM算法研究. 燕山大学学报, 2022, 46(4): 319- 326. URL
	FENG Y B, ZHANG X J, WANG J G. Research on visual SLAM algorithm suitable for indoor dynamic scenes. Journal of Yanshan University, 2022, 46(4): 319- 326. URL
21	高兴波, 史旭华, 葛群峰, 等. 面向动态物体场景的视觉SLAM综述. 机器人, 2021, 43(6): 733- 750. URL
	GAO X B, SHI X H, GE Q F, et al. A survey of visual SLAM for scenes with dynamic objects. Robot, 2021, 43(6): 733- 750. URL
22	丁文东, 徐德, 刘希龙, 等. 移动机器人视觉里程计综述. 自动化学报, 2018, 44(3): 385- 400. URL
	DING W D, XU D, LIU X L, et al. Review on visual odometry for mobile robots. Acta Automatica Sinica, 2018, 44(3): 385- 400. URL
23	LI G H, CHEN S L. Visual slam in dynamic scenes based on object tracking and static points detection. Journal of Intelligent & Robotic Systems, 2022, 104(2): 621- 637.
24	LÓPEZ-MONROY A P, ALDANA D V, MIRANDA A A E, et al. Deep learning for language and vision tasks in surveillance applications. Computacion y Sistemas, 2021, 25(2): 317- 328.
25	ASADI N, OLSON I R, OBRADOVIC Z. The backbone network of dynamic functional connectivity. Network Neuroscience, 2021, 5(4): 851- 873.

序列	ORB-SLAM3				RDS-SLAM				RDTS-SLAM
序列	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD
walking_rpy	0.175 2	0.137 6	0.111 2	0.108 4	0.058 7	0.040 8	0.039 9	0.038 0	0.008 1	0.006 4	0.006 0	0.004 8
walking_xyz	0.337 3	0.323 4	0.322 0	0.095 7	0.024 0	0.022 1	0.021 5	0.013 9	0.010 9	0.009 7	0.008 7	0.004 8
walking_halfsphere	0.694 7	0.627 2	0.569 6	0.298 7	0.030 6	0.028 7	0.026 5	0.017 1	0.022 3	0.020 7	0.020 1	0.019 4
walking_static	0.025 4	0.021 1	0.018 1	0.014 2	0.072 0	0.065 5	0.058 7	0.034 3	0.014 6	0.013 8	0.015 2	0.009 1
sitting_halfsphere	0.025 1	0.021 8	0.018 2	0.012 5	0.010 2	0.009 9	0.009 4	0.008 6	0.018 6	0.016 5	0.015 1	0.008 5
sitting_static	0.037 7	0.033 4	0.032 9	0.017 6	0.008 4	0.008 2	0.007 9	0.004 3	0.029 4	0.028 6	0.026 6	0.013 7

序列	ORB-SLAM3				RDS-SLAM				RDTS-SLAM
序列	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD
walking_rpy	0.175 2	0.137 6	0.111 2	0.108 4	0.058 7	0.040 8	0.039 9	0.038 0	0.008 1	0.006 4	0.006 0	0.004 8
walking_xyz	0.337 3	0.323 4	0.322 0	0.095 7	0.024 0	0.022 1	0.021 5	0.013 9	0.010 9	0.009 7	0.008 7	0.004 8
walking_halfsphere	0.694 7	0.627 2	0.569 6	0.298 7	0.030 6	0.028 7	0.026 5	0.017 1	0.022 3	0.020 7	0.020 1	0.019 4
walking_static	0.025 4	0.021 1	0.018 1	0.014 2	0.072 0	0.065 5	0.058 7	0.034 3	0.014 6	0.013 8	0.015 2	0.009 1
sitting_halfsphere	0.025 1	0.021 8	0.018 2	0.012 5	0.010 2	0.009 9	0.009 4	0.008 6	0.018 6	0.016 5	0.015 1	0.008 5
sitting_static	0.037 7	0.033 4	0.032 9	0.017 6	0.008 4	0.008 2	0.007 9	0.004 3	0.029 4	0.028 6	0.026 6	0.013 7

序列	ORB-SLAM3				RDS-SLAM				RDTS-SLAM
序列	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD
walking_rpy	0.031 4	0.006 9	0.003 1	0.030 6	0.027 4	0.025 5	0.023 2	0.014 0	0.009 0	0.005 1	0.002 5	0.007 4
walking_xyz	0.026 2	0.009 5	0.008 3	0.024 3	0.026 9	0.024 7	0.022 1	0.016 3	0.012 4	0.007 1	0.006 9	0.008 7
walking_halfsphere	0.029 7	0.014 9	0.010 2	0.025 7	0.027 4	0.025 2	0.023 1	0.014 0	0.013 1	0.009 2	0.007 9	0.009 1
walking_static	0.001 1	0.000 8	0.000 6	0.000 7	0.022 1	0.018 6	0.016 2	0.014 9	0.000 9	0.000 7	0.000 5	0.000 6
sitting_halfsphere	0.010 1	0.009 2	0.007 9	0.005 4	0.005 3	0.004 9	0.004 1	0.003 1	0.009 1	0.007 8	0.006 8	0.004 7
sitting_static	0.001 4	0.000 8	0.000 5	0.001 1	0.005 0	0.004 4	0.003 8	0.002 6	0.001 1	0.000 6	0.000 4	0.000 9

序列	ORB-SLAM3				RDS-SLAM				RDTS-SLAM
序列	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD	RMSE	Mean	Median	SD
walking_rpy	0.031 4	0.006 9	0.003 1	0.030 6	0.027 4	0.025 5	0.023 2	0.014 0	0.009 0	0.005 1	0.002 5	0.007 4
walking_xyz	0.026 2	0.009 5	0.008 3	0.024 3	0.026 9	0.024 7	0.022 1	0.016 3	0.012 4	0.007 1	0.006 9	0.008 7
walking_halfsphere	0.029 7	0.014 9	0.010 2	0.025 7	0.027 4	0.025 2	0.023 1	0.014 0	0.013 1	0.009 2	0.007 9	0.009 1
walking_static	0.001 1	0.000 8	0.000 6	0.000 7	0.022 1	0.018 6	0.016 2	0.014 9	0.000 9	0.000 7	0.000 5	0.000 6
sitting_halfsphere	0.010 1	0.009 2	0.007 9	0.005 4	0.005 3	0.004 9	0.004 1	0.003 1	0.009 1	0.007 8	0.006 8	0.004 7
sitting_static	0.001 4	0.000 8	0.000 5	0.001 1	0.005 0	0.004 4	0.003 8	0.002 6	0.001 1	0.000 6	0.000 4	0.000 9

算法	GPU	网络模型	检测与分割耗时	跟踪每帧图像的耗时
ORB-SLAM3				22~30
RDS-SLAM	GeForce RTX 2080Ti	Mask R-CNN	200	50~65
RDTS-SLAM	GeForce RTX 2080Ti	YOLOv5	15	30~45

选择文件类型/文献管理软件名称

选择包含的内容