作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (11): 130-141. doi: 10.19678/j.issn.1000-3428.0068503

• 人工智能与模式识别 • 上一篇    下一篇

应用元素级注意力机制和角点特征的重定位网络

曹雏清1,2,*(), 罗海南1,2, 马玉洁1,2   

  1. 1. 安徽工程大学计算机与信息学院, 安徽 芜湖 241000
    2. 长三角哈特机器人产业技术研究院, 安徽 芜湖 241000
  • 收稿日期:2023-10-07 出版日期:2024-11-15 发布日期:2024-03-15
  • 通讯作者: 曹雏清
  • 基金资助:
    安徽省教育厅科学研究重点项目(KJ2020A0364); 国家自然科学基金面上项目(62073101)

Relocalization Network with Element-wise Attention Mechanism and Corners Features

CAO Chuqing1,2,*(), LUO Hainan1,2, MA Yujie1,2   

  1. 1. School of Computer and Information, Anhui Polytechnic University, Wuhu 241000, Anhui, China
    2. Yangtze River Delta Hart Robotics Industrial Technology Research Institute, Wuhu 241000, Anhui, China
  • Received:2023-10-07 Online:2024-11-15 Published:2024-03-15
  • Contact: CAO Chuqing

摘要:

视觉重定位技术是室内服务机器人关键技术之一, 其主要目的是精确确定机器人在场景中的六自由度位姿。在室内环境中, 稀疏纹理区域的普遍存在对视觉重定位的精度构成了挑战, 因为这些区域中的相似图像块会大幅干扰定位准确性。此外, 现有的视觉重定位网络往往忽视图像中角点的重要性, 限制了卷积神经网络在编码场景信息时的能力, 因为角点中蕴含着丰富的几何特征。为此, 提出一种结合元素级注意力机制和角点特征的视觉重定位网络。为了解决相似图像块的问题, 提出元素级注意力机制, 通过预测元素级注意力加权系数来评估特征图中每个元素的重要性。该方法可有效融合多级特征图, 利用低级特征图中的几何结构信息与高级特征图中的语义信息来提升相似图像块的区分度。针对角点特征被忽视的问题, 提出一种角点特征整合模块, 利用角点提取网络SuperPoint提取大量角点进行聚类, 并选择距离聚类中心最近的角点来保证其均匀地分布在图像中。该网络将提取的角点特征整合进高维特征图中, 从而保证网络充分地提取角点中所包含的图像几何特征, 进而提升网络的场景解析能力。在7-Scenes数据集上的实验结果表明, 在包含大量稀疏纹理的室内场景中, 提升相似图像块间的区分度并整合角点特征可有效提升视觉重定位精度, 使该方法实现了0.025 m的中值平移误差、0.83°的中值旋转误差以及87.43%的重定位准确率。

关键词: 视觉重定位, 室内服务机器人, 卷积神经网络, 多级特征融合, 角点特征

Abstract:

Visual relocalization, an essential technique for indoor service robots, aims to recover the six Degree-Of-Freedom (6-DOF) pose of a robot. However, numerous textureless regions in indoor environments pose a challenge in achieving accurate visual relocalization because similar image patches in these regions significantly disturb the relocalization accuracy. In addition, current visual relocalization networks ignore the importance of corners, which means that the abundant geometric features are not fully leveraged. This limits the encoding capability of a network for scene information. To resolve these issues, this paper proposes a novel visual localization network containing an element-wise attention mechanism and corner features. First, to solve the problems caused by similar image patches, the network introduces an element-wise weighting mechanism that predicts element-wise weighting factors to measure the importance of each element in the feature maps. Thereafter, multi-level features are fused effectively, and the structural information in low-level features and semantic information in high-level features are leveraged to distinguish similar image patches. To ignore corner features, the network introduces a corner-feature integration module that clusters numerous corners extracted using SuperPoint and selects the corners closest to the clustering centers to ensure a uniform corner distribution. Subsequently, the network integrates the corner features into high-dimensional features, which ensures an adequate extraction of the geometric features contained in the corners. Thus, the scene parsing capability of the network is boosted. Experimental results using the 7-Scenes dataset demonstrate that in indoor scenes with large sparse textures, distinguishing similar image patches and integrating corner information is conducive for boosting the relocalization accuracy. The method achieves median positional error, median rotational error, and accuracy of 87.43 %, 0.025 m, and 0.83°, respectively.

Key words: visual relocalization, indoor service robots, convolutional neural network, multi-level feature fusion, corners features