作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

一种自适应窗隔匹配及深度学习相结合的RGB-D SLAM算法

  • 发布日期:2020-08-17

A RGB-D SLAM Algorithm Combining Adaptive Window Interval Matching and Deep Learning

  • Published:2020-08-17

摘要: 在动态场景的 SLAM 系统中,传统的特征点法视觉 SLAM 系统易受动态物体影响,使得图像前后两帧的动态物体区 域出现大量的误匹配,导致机器人定位精度不高。对此,本文提出了一种自适应窗隔匹配及深度学习相结合的动态场景 RGB-D SLAM 算法。首先,构建基于自适应窗隔匹配模型的视觉 SLAM 前端算法框架。该框架首先进行图像帧筛选,然后采用基于 网格的概率运动统计方式进行匹配点筛选,获得静态区域的特征匹配点对,并使用恒速度模型或参考帧模型实现位姿估计。 其次,利用深度学习算法 Mask R-CNN 提供的语义信息,实现动态场景的静态三维稠密地图构建。最后,在 TUM 数据集上 和实际环境中进行算法验证,实验结果表明,该算法在动态场景下的定位精度和跟踪速度均优于 ORB-SLAM2 及 DynaSLAM, 且在全长 6.62m 的高动态场景中定位精度可达 1.475cm,平均跟踪时间为 0.024s。同时构建动态场景的静态稠密三维地图,实 现了地图的可重复使用。

Abstract: In the SLAM system of dynamic scene, the traditional visual SLAM system based on point feature method is easily affected by dynamic objects, so the robot positioning accuracy is low because of large mismatches in the dynamic object areas between the front and next frame. To solve this problem, this paper proposed a RGB-D SLAM algorithm for the dynamic scene, which combines adaptive window interval matching and deep learning approach. Firstly, it constructed the framework for front-end algorithm of visual SLAM that based on adaptive window interval matching model. The framework selected the image frames first, then used grid-based probabilistic motion statistics to select matching points. So, the feature matching point pairs in static regions can be obtained, and then used the constant speed model or reference frame model to achieve the position estimation. Secondly, it used the semantic information provided by deep learning algorithm Mask R-CNN to construct a static 3D dense map of the dynamic scenes. Finally, the algorithm is validated on the TUM dataset and the real-world environment, and the results show that the positioning accuracy and tracking speed of the algorithm are better than ORB-SLAM2 and DynaSLAM in dynamic scene. The positioning accuracy in the highly dynamic scenes around 6.62m can reach 1.475cm, and the average tracking time is 0.024s. Furthermore, it constructs the static 3D map of dynamic scenes, which realizes its reusability.