Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (8): 224-233. doi: 10.19678/j.issn.1000-3428.0058956

• Graphics and Image Processing • Previous Articles     Next Articles

RGB-D SLAM Algorithm Combining Adaptive Window Interval Matching and Deep Learning

YU Dongying, LIU Guihua, ZENG Weilin, FENG Bo, ZHANG Wenkai   

  1. School of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China
  • Received:2020-07-16 Revised:2020-09-06 Published:2021-08-14

自适应窗隔匹配与深度学习相结合的RGB-D SLAM算法

余东应, 刘桂华, 曾维林, 冯波, 张文凯   

  1. 西南科技大学 信息工程学院, 四川 绵阳 621010
  • 作者简介:余东应(1995-),男,硕士研究生,主研方向为视觉SLAM、三维重建、图像处理;刘桂华(通信作者),教授、博士;曾维林、冯波、张文凯,硕士研究生。
  • 基金资助:
    国防科工局核能开发科研项目“核应急处置机器人关键技术研究”([2016]1295);四川省科技厅重点研发项目“智能AGV车辆的SLAM建图与自主导航避障方法”(2021YFG0380)。

Abstract: In dynamic scenes, the traditional visual SLAM systems based on the feature point method are easily affected by dynamic objects, producing a large number of mismatches in the dynamic object areas between the previous frame and the next frame, so the positioning accuracy of robots is significantly reduced. To solve the problem, an RGB-SLAM algorithm that combines an adaptive window interval matching model and a deep learning algorithm is proposed for dynamic scenes. A framework for the front-end algorithm of visual SLAM is constructed based on the adaptive window interval matching model. The framework selects the image frames, and uses grid-based probabilistic motion statistics to filter the matched points, so the matched feature point pairs in static areas are obtained. Then the constant speed model or reference frame model is used to achieve position estimation. On this basis, the semantic information provided by the deep learning algorithm Mask R-CNN is used to construct a static 3D dense map of the dynamic scenes. The algorithm is tested on the TUM data set and in the real-world environment, and the results show that the positioning accuracy and tracking speed of the algorithm are better than that of ORB-SLAM2 and DynaSLAM in dynamic scenes. The positioning accuracy of the algorithm reaches up to 1.475 cm in the highly dynamic scenes that are as wide as 6.62 m, and the average tracking time of the algorithm is 0.024 s.

Key words: dynamic scene, adaptive window interval matching, static region feature matching, deep learning, static 3D dense map construction

摘要: 在动态场景的SLAM系统中,传统的特征点法视觉SLAM系统易受动态物体的影响,使得图像前后两帧的动态物体区域出现大量的误匹配,导致机器人定位精度不高。为此,提出一种结合自适应窗隔匹配模型与深度学习算法的动态场景RGB-D SLAM算法。构建基于自适应窗隔匹配模型的视觉SLAM前端算法框架,该框架筛选图像帧后采用基于网格的概率运动统计方式实现匹配点筛选,以获得静态区域的特征匹配点对,然后使用恒速度模型或参考帧模型实现位姿估计。利用深度学习算法Mask R-CNN提供的语义信息进行动态场景的静态三维稠密地图构建。在TUM数据集和实际环境中进行算法性能验证,结果表明,该算法在动态场景下的定位精度和跟踪速度均优于ORB-SLAM2及DynaSLAM系统,在全长为6.62 m的高动态场景中定位精度可达1.475 cm,平均跟踪时间为0.024 s。

关键词: 动态场景, 自适应窗隔匹配, 静态区域特征匹配, 深度学习, 静态三维稠密地图构建

CLC Number: