RGB-D SLAM Algorithm Combining Adaptive Window Interval Matching and Deep Learning

doi:10.19678/j.issn.1000-3428.0058956

Abstract

Abstract: In dynamic scenes, the traditional visual SLAM systems based on the feature point method are easily affected by dynamic objects, producing a large number of mismatches in the dynamic object areas between the previous frame and the next frame, so the positioning accuracy of robots is significantly reduced. To solve the problem, an RGB-SLAM algorithm that combines an adaptive window interval matching model and a deep learning algorithm is proposed for dynamic scenes. A framework for the front-end algorithm of visual SLAM is constructed based on the adaptive window interval matching model. The framework selects the image frames, and uses grid-based probabilistic motion statistics to filter the matched points, so the matched feature point pairs in static areas are obtained. Then the constant speed model or reference frame model is used to achieve position estimation. On this basis, the semantic information provided by the deep learning algorithm Mask R-CNN is used to construct a static 3D dense map of the dynamic scenes. The algorithm is tested on the TUM data set and in the real-world environment, and the results show that the positioning accuracy and tracking speed of the algorithm are better than that of ORB-SLAM2 and DynaSLAM in dynamic scenes. The positioning accuracy of the algorithm reaches up to 1.475 cm in the highly dynamic scenes that are as wide as 6.62 m, and the average tracking time of the algorithm is 0.024 s.

Key words: dynamic scene, adaptive window interval matching, static region feature matching, deep learning, static 3D dense map construction

摘要： 在动态场景的SLAM系统中，传统的特征点法视觉SLAM系统易受动态物体的影响，使得图像前后两帧的动态物体区域出现大量的误匹配，导致机器人定位精度不高。为此，提出一种结合自适应窗隔匹配模型与深度学习算法的动态场景RGB-D SLAM算法。构建基于自适应窗隔匹配模型的视觉SLAM前端算法框架，该框架筛选图像帧后采用基于网格的概率运动统计方式实现匹配点筛选，以获得静态区域的特征匹配点对，然后使用恒速度模型或参考帧模型实现位姿估计。利用深度学习算法Mask R-CNN提供的语义信息进行动态场景的静态三维稠密地图构建。在TUM数据集和实际环境中进行算法性能验证，结果表明，该算法在动态场景下的定位精度和跟踪速度均优于ORB-SLAM2及DynaSLAM系统，在全长为6.62 m的高动态场景中定位精度可达1.475 cm，平均跟踪时间为0.024 s。

关键词: 动态场景, 自适应窗隔匹配, 静态区域特征匹配, 深度学习, 静态三维稠密地图构建

CLC Number:

TP391

YU Dongying, LIU Guihua, ZENG Weilin, FENG Bo, ZHANG Wenkai. RGB-D SLAM Algorithm Combining Adaptive Window Interval Matching and Deep Learning[J]. Computer Engineering, 2021, 47(8): 224-233.

余东应, 刘桂华, 曾维林, 冯波, 张文凯. 自适应窗隔匹配与深度学习相结合的RGB-D SLAM算法[J]. 计算机工程, 2021, 47(8): 224-233.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0058956

http://www.ecice06.com/EN/Y2021/V47/I8/224

Figures/Tables 16

References

[1] NISTÉR D, NARODITSKY O, BERGEN J.Visual odometry for ground vehicle applications[J]. Journal of Field Robotics, 2006, 23(1): 3-20.
[2] HOWARD A.Real-time stereo visual odometry for autonomous ground vehicles[C]//Proceedings of 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2008:3946-3952.
[3] JOO S-H, MANZOOR S, ROCHA Y G, et al. A realtime autonomous robot navigation framework for human like high-level interaction and task planning in global dynamic environment[EB/OL]. [2020-06-01]. https://arxiv.org/ftp/arxiv/papers/1905/1905.12942.pdf.
[4] KLEIN G, MURRAY D.Parallel tracking and mapping for small AR workspaces[C]//Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.Washington D.C., USA:IEEE Press, 2007:1-10.
[5] 赵越, 李晶皎, 王爱侠, 等. 基于IEKF-SLAM的未知场景增强现实跟踪注册算法[J]. 计算机工程, 2016, 42(1): 272-277. ZHAO Y, LI J J, WANG A X, et al. Tracking and registration algorithm of augmented reality on unknown scene based on IEKF-SLAM[J]. Computer Engineering, 2016, 42(1): 272-277.(in Chinese)
[6] MUR-ARTAL R, TARDÓS J D.ORB-SLAM2:an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262.
[7] ENGEL J, KOLTUN V, CREMERS D.Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(3): 611-625.
[8] GOMEZ-OJEDA R, MORENO F-A, ZUÑIGA-NOËL D, et al. PL-SLAM:a stereo SLAM system through the combination of points and line segments[J]. IEEE Transactions on Robotics, 2019, 35(3): 734-746.
[9] SHI J.Good features to track[C]//Proceedings of 1994 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 1994:593-600.
[10] ROSTEN E, DRUMMOND T.Machine learning for high-speed corner detection[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2006:430-443.
[11] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB:an efficient alternative to SIFT or SURF[C]//Proceedings of the 2011 International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2011:1-2.
[12] RAGURAM R, FRAHM J-M, POLLEFEYS M.A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2008:500-513.
[13] FANG Y, DAI B.An improved moving target detecting and tracking based on optical flow technique and Kalman filter[C]//Proceedings of 2009 International Conference on Computer Science & Education.Berlin, Germany:Springer, 2009:1197-1202.
[14] WANG Y, HUANG S.Motion segmentation based robust RGB-D SLAM[C]//Proceeding of the 11th World Congress on Intelligent Control and Automation.Berlin, Germany:Springer, 2014:3122-3127.
[15] ALCANTARILLA P F, YEBES J J, ALMAZÁN J, et al. On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments[C]//Proceedings of 2012 IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2012:1290-1297.
[16] 杨雪, 范勇, 高琳, 等. 基于纹理基元块识别与合并的图像语义分割[J]. 计算机工程, 2015, 41(3): 253-257. YANG X, FAN Y, GAO L, et al. Image semantic segmentation based on texture element block recognition and merging[J]. Computer Engineering, 2015, 41(3): 253-257.(in Chinese)
[17] 夏胡云, 叶学义, 罗宵晗, 等. 多尺度空间金字塔池化PCANet的行人检测[J]. 计算机工程, 2019, 45(2): 270-277. XIA H Y, YE X Y, LUO X H, et al. Pedestrian detection using multi-scale principal component analysis network of spatial pyramid pooling[J]. Computer Engineering, 2019, 45(2): 270-277.(in Chinese)
[18] YU C, LIU Z, LIU X J, et al. DS-SLAM:a semantic visual slam towards dynamic environments[C]//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2018:1168-1174.
[19] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SEGNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[20] BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM:tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083.
[21] HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2961-2969.
[22] BÂRSAN I A, LIU P, POLLEFEYS M, et al. Robust dense mapping for large-scale dynamic environments[C]//Proceedings of 2018 IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2018:7510-7517.
[23] CALONDER M, LEPETIT V, STRECHA C, et al. Brief:Binary robust independent elementary features[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2010:778-792.
[24] BIAN J, LIN W-Y, MATSUSHITA Y, et al. GMS:grid-based motion statistics for fast, ultra-robust feature correspondence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:4181-4190.
[25] LIN T-Y, MAIRE M, BELONGIE S, et al. Microsoft coco:common objects in context[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2014:740-755.
[26] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems.Berlin, Germany:Springer, 2015:91-99.
[27] STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2012:573-580.

Please choose a citation manager

Content to export