一种自适应窗隔匹配及深度学习相结合的RGB-D SLAM算法

doi:10.19678/j.issn.1000-3428.0058986

摘要/Abstract

摘要： 在动态场景的 SLAM 系统中，传统的特征点法视觉 SLAM 系统易受动态物体影响，使得图像前后两帧的动态物体区域出现大量的误匹配，导致机器人定位精度不高。对此，本文提出了一种自适应窗隔匹配及深度学习相结合的动态场景 RGB-D SLAM 算法。首先，构建基于自适应窗隔匹配模型的视觉 SLAM 前端算法框架。该框架首先进行图像帧筛选，然后采用基于网格的概率运动统计方式进行匹配点筛选，获得静态区域的特征匹配点对，并使用恒速度模型或参考帧模型实现位姿估计。其次，利用深度学习算法 Mask R-CNN 提供的语义信息，实现动态场景的静态三维稠密地图构建。最后，在 TUM 数据集上和实际环境中进行算法验证，实验结果表明，该算法在动态场景下的定位精度和跟踪速度均优于 ORB-SLAM2 及 DynaSLAM，且在全长 6.62m 的高动态场景中定位精度可达 1.475cm，平均跟踪时间为 0.024s。同时构建动态场景的静态稠密三维地图，实现了地图的可重复使用。

Abstract: In the SLAM system of dynamic scene, the traditional visual SLAM system based on point feature method is easily affected by dynamic objects, so the robot positioning accuracy is low because of large mismatches in the dynamic object areas between the front and next frame. To solve this problem, this paper proposed a RGB-D SLAM algorithm for the dynamic scene, which combines adaptive window interval matching and deep learning approach. Firstly, it constructed the framework for front-end algorithm of visual SLAM that based on adaptive window interval matching model. The framework selected the image frames first, then used grid-based probabilistic motion statistics to select matching points. So, the feature matching point pairs in static regions can be obtained, and then used the constant speed model or reference frame model to achieve the position estimation. Secondly, it used the semantic information provided by deep learning algorithm Mask R-CNN to construct a static 3D dense map of the dynamic scenes. Finally, the algorithm is validated on the TUM dataset and the real-world environment, and the results show that the positioning accuracy and tracking speed of the algorithm are better than ORB-SLAM2 and DynaSLAM in dynamic scene. The positioning accuracy in the highly dynamic scenes around 6.62m can reach 1.475cm, and the average tracking time is 0.024s. Furthermore, it constructs the static 3D map of dynamic scenes, which realizes its reusability.

余东应, 刘桂华, 曾维林, 冯波, 张文凯. 一种自适应窗隔匹配及深度学习相结合的RGB-D SLAM算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0058986.

Yu Dongying, Liu Guihua, Zeng Weilin, Feng Bo, Zhang Wenkai. A RGB-D SLAM Algorithm Combining Adaptive Window Interval Matching and Deep Learning[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0058986.

参考文献

[1] Nistér D, Naroditsky O, Bergen J. Visual odometry for ground vehicle applications[J]. Journal of Field Robotics, 2006, 23(1): 3-20.
[2] Howard A. Real-time stereo visual odometry for autonomous ground vehicles[C]. 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008: 3946-3952.
[3] Joo S-H, Manzoor S, Rocha Y G, et al. A Realtime Autonomous Robot Navigation Framework for Human like High-level Interaction and Task Planning in Global Dynamic Environment[J]. arXiv preprint arXiv:1905.12942, 2019.
[4] Klein G, Murray D. Parallel tracking and mapping for small AR workspaces[C]. Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007: 1-10.
[5] ZHAO Yue,LI Jingjiao,WANG Aixia, et al. Tracking and Registration Algorithm of Augmented Reality on Unknown Scene Based on IEKF-SLAM[J]. Computer Engineering, 2016,42(001):272-277. (in Chinese) 赵越, 李晶皎, 王爱侠, et al. 基于 IEKF-SLAM 的未知场景增强现实跟踪注册算法[J]. 计算机工程, 2016, 42(001): 272-277.
[6] Mur-Artal R, Tardós J D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262.
[7] Engel J, Koltun V, Cremers D. Direct sparse odometry[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(3): 611-625.
[8] Gomez-Ojeda R, Moreno F-A, Zuñiga-Noël D, et al. PL-SLAM: A stereo SLAM system through the combination of points and line segments[J]. IEEE Transactions on Robotics, 2019, 35(3): 734-746.
[9] Shi J. Good features to track[C]. 1994 Proceedings of IEEE conference on computer vision and pattern recognition, 1994: 593-600.
[10] Rosten E, Drummond T. Machine learning for high-speed corner detection[C]. European conference on computer vision, 2006: 430-443.
[11] Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]. ICCV, 2011: 2.
[12] Raguram R, Frahm J-M, Pollefeys M. A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus[C]. European Conference on Computer Vision, 2008: 500-513.
[13] Fang Y, Dai B. An improved moving target detecting and tracking based on optical flow technique and kalman filter[C].2009 4th International Conference on Computer Science & Education, 2009: 1197-1202.
[14] Wang Y, Huang S. Motion segmentation based robust RGB-D SLAM[C]. Proceeding of the 11th World Congress on Intelligent Control and Automation, 2014: 3122-3127.
[15] Alcantarilla P F, Yebes J J, Almazán J, et al. On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments[C]. 2012 IEEE International Conference on Robotics and Automation, 2012: 1290-1297.
[16] XI Zhihong, HAN Shuangquan, WANG Hongxu. Simultaneous localization and semantic mapping of indoor dynamic scene based on semantic segmentation[J]. Journal of Computer Applications, 2019, 39(10): 2847-2851. (in Chinese) 席志红, 韩双全, 王洪旭. 基于语义分割的室内动态场景同步定位与语义建图[J]. 计算机应用, 2019, 39(10): 2847-2851.
[17] XIA Huyun, YE Xueyi, LUO Xiaohan, et al. Pedestrian Detection Using Multi-scale Principal Component Analysis Network of Spatial Pyramid Pooling[J]. Computer Engineering, 2019, 45(2): 270-277. (in Chinese) 夏胡云,叶学义,罗宵晗,王鹏. 多尺度空间金字塔池化 PCANet 的行人检测[J]. 计算机工程, 2019, 45(2): 270-277.
[18] Yu C, Liu Z, Liu X-J, et al. Ds-slam: A semantic visual slam towards dynamic environments[C]. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018: 1168-1174.
[19] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.
[20] Bescos B, Fácil J M, Civera J, et al. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083.
[21] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]. Proceedings of the IEEE international conference on computer vision, 2017: 2961-2969.
[22] Bârsan I A, Liu P, Pollefeys M, et al. Robust dense mapping for large-scale dynamic environments[C]. 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018: 7510-7517.
[23] Calonder M, Lepetit V, Strecha C, et al. Brief: Binary robust independent elementary features[C]. European conference on computer vision, 2010: 778-792.
[24] Bian J, Lin W-Y, Matsushita Y, et al. Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4181-4190.
[25] Lin T-Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]. European conference on computer vision, 2014: 740-755.
[26] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]. Advances in neural information processing systems, 2015: 91-99.
[27] Sturm J, Engelhard N, Endres F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012: 573-580.

选择文件类型/文献管理软件名称

选择包含的内容