一种结合深度学习的运动重检测视觉SLAM算法

doi:10.19678/j.issn.1000-3428.0061041

摘要/Abstract

摘要： 在现实场景中，传统视觉同步定位与建图（SLAM）算法存在静态环境假设的限制。由于运动物体的影响，传统的视觉里程计存在大量误匹配，从而影响整个SLAM算法的运行精度，导致系统无法在现实场景中稳定运行。基于深度学习和多视图几何，提出一种面向室内动态环境的视觉SLAM算法。采用目标检测网络对动态物体进行预检测确定潜在运动对象，根据预检测结果，利用多视图几何完成运动物体重检测，确认实际产生运动的物体并将场景中的对象划分为动态和静态两种状态。基于跟踪线程和局部建图线程，提出一种语义数据关联方法和关键帧选取策略，以减少运动物体对算法精度的影响，提高系统的稳定性。在TUM公开数据集上的实验结果表明，在动态场景下，相较于ORB-SLAM2算法，该算法平均均方根误差降低了40%，与同样具有运动剔除的DynaSLAM算法相比，算法实时性提高10倍以上，且运行速度与精度均明显提高。

关键词: 同步定位与建图, 深度学习, 多视图几何, 动态场景, 运动剔除

Abstract: In real scenes, the traditional visual Simultaneous Localization and Mapping(SLAM) algorithm is limited by the assumption of a static environment.Because of the influence of moving objects, the traditional visual odometer makes many mismatches.Thisaffects the running accuracy of the entire SLAM algorithm, which makes the system unable to run statically in real scenes.This paper proposes a robust visual SLAM algorithm for indoor dynamic environments based on deep learning and multiview geometry.First, the object detection network is used to predetect the dynamic objects to determine the potential moving objects.Then, according to the predetection results, multiview geometry is used to complete the redetection of moving objects to confirm the actual moving objects.The objects in the scene are divided into dynamic and static states.Second, a semantic data association method and a key-frame selection strategy are proposed for the tracking thread and the local mapping thread to reduce the influence of moving objects on the algorithm accuracy.Experimental results on the Technical University of Munich(TUM) open dataset show that, in dynamic scenarios, compared with the Oriented fast and Rotated Brief Simultaneous Localization and Mapping 2 (ORB-SLAM2) algorithm, the root-mean-square error of the proposed algorithm is reduced by 40%.Compared with the Dynamic Simultaneous Localization and Mapping(DynaSLAM) algorithm with kinematic removal, the real-time performance of the proposed algorithm is more than 10 times better.In addition, the running speed and accuracy are improvedsignificantly.

Key words: Simultaneous Localization and Mapping(SLAM), deep learning, multi-view geometry, dynamic scene, motion removal

中图分类号:

TP18

房立金, 王科棋. 一种结合深度学习的运动重检测视觉SLAM算法[J]. 计算机工程, 2022, 48(5): 18-26.

FANG Lijin, WANG Keqi. A Visual SLAM Algorithm for Motion Redetection Combined with Deep Learning[J]. Computer Engineering, 2022, 48(5): 18-26.

https://www.ecice06.com/CN/Y2022/V48/I5/18

图/表 14

20220723172448

20220723172452

20220723172456

20220723172459

20220723172503

20220723172506

20220723172511

20220723172515

20220723172518

20220723172522

20220723172526

20220723172531

20220723172534

20220723172538

参考文献

[1] ENGEL J, SCHÖPS T, CREMERS D.LSD-SLAM:large-scale direct monocular SLAM[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2014:834-849.
[2] ENGEL J, KOLTUN V, CREMERS D.Direct sparse odometry[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3):611-625.
[3] MUR-ARTAL R, TARDÓS J D.ORB-SLAM2:an open-source SLAM system for monocular, stereo, and RGB-D cameras[J].IEEE Transactions on Robotics, 2017, 33(5):1255-1262.
[4] QIN T, LI P L, SHEN S J.VINS-mono:a robust and versatile monocular visual-inertial state estimator[J].IEEE Transactions on Robotics, 2018, 34(4):1004-1020.
[5] CHENG J Y, SUN Y X, CHI W Z, et al.An accurate localization scheme for mobile robots using optical flow in dynamic environments[C]//Proceedings of 2018 IEEE International Conference on Robotics and Biomimetics.Washington D.C., USA:IEEE Press, 2018:723-728.
[6] SUN Y X, LIU M, MENG M Q H.Improving RGB-D SLAM in dynamic environments:a motion removal approach[J].Robotics and Autonomous Systems, 2017, 89:110-122.
[7] LI S L, LEE D.RGB-D SLAM in dynamic environments using static point weighting[J].IEEE Robotics and Automation Letters, 2017, 2(4):2263-2270.
[8] LIU H M, ZHANG G F, BAO H J.Robust keyframe-based monocular SLAM for augmented reality[C]//Proceedings of 2016 IEEE International Symposium on Mixed and Augmented Reality.Washington D.C., USA:IEEE Press, 2016:340-341.
[9] WEI T, LIU H M, DONG Z L, et al.Robust monocular SLAM in dynamic environments[C]//Proceedings of 2013 IEEE International Symposium on Mixed and Augmented Reality.Washington D.C., USA:IEEE Press, 2013:209-218.
[10] FISCHLER M A, BOLLES R C.A paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM, 1981, 24(6):381-395.
[11] YU C, LIU Z X, LIU X J, et al.DS-SLAM:a semantic visual SLAM towards dynamic environments[C]//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2018:1168-1174.
[12] BESCOS B, FÁCIL J M, CIVERA J, et al.DynaSLAM:tracking, mapping, and inpainting in dynamic scenes[J].IEEE Robotics and Automation Letters, 2018, 3(4):4076-4083.
[13] 张金凤, 石朝侠, 王燕清.动态场景下基于视觉特征的SLAM方法[J].计算机工程, 2020, 46(10):95-102. ZHANG J F, SHI C X, WANG Y Q.SLAM method based on visual features in dynamic scene[J].Computer Engineering, 2020, 46(10):95-102.(in Chinese)
[14] HENEIN M, ZHANG J, MAHONY R, et al.Dynamic SLAM:the need for speed[C]//Proceedings of 2020 IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2020:2123-2129.
[15] BESCOS B, CAMPOS C, TARDÓS J D, et al.DynaSLAM II:tightly-coupled multi-object tracking and SLAM[J].IEEE Robotics and Automation Letters, 2021, 6(3):5191-5198.
[16] 张晨阳, 黄腾, 吴壮壮.基于K-Means聚类与深度学习的RGB-D SLAM算法[J].计算机工程, 2022, 48(1):236-244, 252. ZHANG C Y, HUANG T, WU Z Z.RGB-D SLAM algorithm based on K-Means clustering and deep learning[J].Computer Engineering, 2022, 48(1):236-244, 252.(in Chinese)
[17] LIU W, ANGUELOV D, ERHAN D, et al.SSD:single shot MultiBox detector[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:21-37.
[18] HOWARD A G, ZHU M L, CHEN B, et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2021-02-10].https://arxiv.org/abs/1704.04861.
[19] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2021-02-10].https://arxiv.org/abs/1409.1556.
[20] MA J L, CHEN B.Dual refinement feature pyramid networks for object detection[EB/OL].[2021-02-10].https://arxiv.org/abs/2012.01733.
[21] FU C Y, LIU W, RANGA A, et al.DSSD:deconvolutional single shot detector[EB/OL].[2021-02-10].https://arxiv.org/abs/1701.06659.
[22] LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:common objects in context[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2014:740-755.
[23] COVER T M, THOMAS J A.Elements of information theory[M].New York, USA:John Wiley & Sons, Inc., 1991.
[24] BESL P J, MCKAY N D.A method for registration of 3-D shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992, 14(2):239-256.
[25] HANDA A, WHELAN T, MCDONALD J, et al.A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM[C]//Proceedings of 2014 IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2014:1524-1531.

选择文件类型/文献管理软件名称

选择包含的内容