Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A Review of VSLAM Technology Empowered by Deep Learning from a Problem-Driven Perspective

  

  • Online:2026-03-17 Published:2026-03-17

问题驱动视角下深度学习赋能的VSLAM技术综述

Abstract: Visual Simultaneous Localization and Mapping is a core technology in the field of mobile robotics. Traditional VSLAM methods primarily rely on manually designed features and geometric constraints, facing numerous challenges in complex environments. In recent years, deep learning-based approaches have provided new solutions to address these challenges. This paper reviews the research progress of deep learning-based VSLAM from a problem-driven perspective. Firstly, the basic system framework of VSLAM is introduced, and the main challenges it faces are analyzed. The review focuses on three key issues: for dynamic interference, it analyzes dynamic detection methods based on semantic segmentation and semantic-geometry fusion; for illumination variations, it systematically reviews robust frontend designs based on image enhancement, exposure control, and learned feature extraction; for lightweight and real-time deployment requirements, it discusses the application of network model compression and hardware acceleration techniques on edge devices. It also briefly discusses representative solutions for challenges such as texture deficiency, fast motion, scale uncertainty, large-scale environments, and long-term operation. This paper starts from the key issues that restrict the performance of VSLAM in practical applications, constructs a problem-driven analysis framework, and reveals the differences in the applicability of different technical routes in complex scenarios. Finally, it summarizes common evaluation metrics and public datasets, and provides a conclusion with outlooks on future research directions.

摘要: 视觉同步定位与建图是移动机器人领域的核心技术。传统的VSLAM方法主要依赖于人工设计的特征和几何约束,在复杂的环境中面临诸多挑战。近年来基于深度学习的方法为应对这些挑战提供了新的技术方案。本文从问题驱动的角度,回顾了基于深度学习的VSLAM研究进展。首先介绍了VSLAM的基本系统框架,分析了其面临的主要挑战。重点围绕三类关键问题展开综述:针对动态干扰,重点分析了基于语义分割、语义—几何融合的动态检测方法;针对光照变化,系统梳理了基于图像增强、曝光控制与学习型特征提取的鲁棒前端设计;针对轻量化与实时部署需求,讨论了网络模型压缩、硬件加速等技术在边缘设备上的应用。同样简略的讨论了纹理缺乏、快速运动、尺度不确定性、大规模环境和长期运行的问题的代表性解决方案。本文从实际应用中制约VSLAM性能的关键问题出发,构建问题驱动的分析框架,并揭示不同技术路线在复杂场景中的适用性差异。最后总结了常用的评估指标和公共数据集,对本文进行总结并对未来研究方向进行展望。