应用元素级注意力机制和角点特征的重定位网络

doi:10.19678/j.issn.1000-3428.0068503

摘要/Abstract

摘要：

视觉重定位技术是室内服务机器人关键技术之一, 其主要目的是精确确定机器人在场景中的六自由度位姿。在室内环境中, 稀疏纹理区域的普遍存在对视觉重定位的精度构成了挑战, 因为这些区域中的相似图像块会大幅干扰定位准确性。此外, 现有的视觉重定位网络往往忽视图像中角点的重要性, 限制了卷积神经网络在编码场景信息时的能力, 因为角点中蕴含着丰富的几何特征。为此, 提出一种结合元素级注意力机制和角点特征的视觉重定位网络。为了解决相似图像块的问题, 提出元素级注意力机制, 通过预测元素级注意力加权系数来评估特征图中每个元素的重要性。该方法可有效融合多级特征图, 利用低级特征图中的几何结构信息与高级特征图中的语义信息来提升相似图像块的区分度。针对角点特征被忽视的问题, 提出一种角点特征整合模块, 利用角点提取网络SuperPoint提取大量角点进行聚类, 并选择距离聚类中心最近的角点来保证其均匀地分布在图像中。该网络将提取的角点特征整合进高维特征图中, 从而保证网络充分地提取角点中所包含的图像几何特征, 进而提升网络的场景解析能力。在7-Scenes数据集上的实验结果表明, 在包含大量稀疏纹理的室内场景中, 提升相似图像块间的区分度并整合角点特征可有效提升视觉重定位精度, 使该方法实现了0.025 m的中值平移误差、0.83°的中值旋转误差以及87.43%的重定位准确率。

关键词: 视觉重定位, 室内服务机器人, 卷积神经网络, 多级特征融合, 角点特征

Abstract:

Visual relocalization, an essential technique for indoor service robots, aims to recover the six Degree-Of-Freedom (6-DOF) pose of a robot. However, numerous textureless regions in indoor environments pose a challenge in achieving accurate visual relocalization because similar image patches in these regions significantly disturb the relocalization accuracy. In addition, current visual relocalization networks ignore the importance of corners, which means that the abundant geometric features are not fully leveraged. This limits the encoding capability of a network for scene information. To resolve these issues, this paper proposes a novel visual localization network containing an element-wise attention mechanism and corner features. First, to solve the problems caused by similar image patches, the network introduces an element-wise weighting mechanism that predicts element-wise weighting factors to measure the importance of each element in the feature maps. Thereafter, multi-level features are fused effectively, and the structural information in low-level features and semantic information in high-level features are leveraged to distinguish similar image patches. To ignore corner features, the network introduces a corner-feature integration module that clusters numerous corners extracted using SuperPoint and selects the corners closest to the clustering centers to ensure a uniform corner distribution. Subsequently, the network integrates the corner features into high-dimensional features, which ensures an adequate extraction of the geometric features contained in the corners. Thus, the scene parsing capability of the network is boosted. Experimental results using the 7-Scenes dataset demonstrate that in indoor scenes with large sparse textures, distinguishing similar image patches and integrating corner information is conducive for boosting the relocalization accuracy. The method achieves median positional error, median rotational error, and accuracy of 87.43 %, 0.025 m, and 0.83°, respectively.

Key words: visual relocalization, indoor service robots, convolutional neural network, multi-level feature fusion, corners features

曹雏清, 罗海南, 马玉洁. 应用元素级注意力机制和角点特征的重定位网络[J]. 计算机工程, 2024, 50(11): 130-141.

CAO Chuqing, LUO Hainan, MA Yujie. Relocalization Network with Element-wise Attention Mechanism and Corners Features[J]. Computer Engineering, 2024, 50(11): 130-141.

https://www.ecice06.com/CN/Y2024/V50/I11/130

图/表 20

图1 本文网络的整体架构

Fig.1 The overall architecture of proposed network

图2 相似图像块对于场景坐标预测的干扰

Fig.2 Interference of similar image patches for scene coordinates prediction

图3 通道级注意力机制结构

Fig.3 The structure of channel-wise attention mechanism

图4 SuperPoint提取的64个角点

Fig.4 64 corners extracted by SuperPoint

图5 角点特征整合模块示意图

Fig.5 Schematic diagram of corner feature integration module

图6 不确定度可视化

Fig.6 Visualization of uncertainties

图7 7-Scenes数据集中不同场景图像对比

Fig.7 Comparison of different scenes images in the 7-Scenes dataset

图8 不同方法在7-Scenes数据集上的误差累计分布曲线

Fig.8 Error cumulative distribution curves of different methods on the 7-Scenes dataset

图9 不确定度对于预测场景坐标的过滤效果

Fig.9 The filtering effect of uncertainty on predicting scene coordinates

图10 真值轨迹(蓝线)与预测轨迹(红线)的可视化

Fig.10 Visualization of true value trajectory (blue line) and predicted trajectory (red line)

图11 不同方法的场景坐标

Fig.11 Scene coordinates for different methods

图12 真实场景实验示意图

Fig.12 Schematic diagram of real scene experiment

参考文献 33

1	BRACHMANN E, CAVALLARI T, PRISACARIU V A. Accelerated coordinate encoding: learning to relocalize in minutes using RGB and poses[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 5044-5053.
2	WANG J K, WANG P, DAI D Y, et al. Regression forest based RGB-D visual relocalization using coarse-to-fine strategy. IEEE Robotics and Automation Letters, 2020, 5(3): 4431- 4438. doi: 10.1109/LRA.2020.3000429
3	KENDALL A, GRIMES M, CIPOLLA R. PoseNet: a convolutional network for real-time 6-DOF camera relocalization[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2015: 2938-2946.
4	SHAVIT Y, FERENS R, KELLER Y. Learning multi-scene absolute pose regression with transformers[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, IEEE Press, 2021: 2733-2742.
5	LASKAR Z, MELEKHOV I, KALIA S, et al. Camera relocalization by computing pairwise relative poses using convolutional neural network[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 929-938.
6	TURKOGLU M O, BRACHMANN E, SCHINDLER K, et al. Visual camera re-localization using graph neural networks and relative pose supervision[C]//Proceedings of International Conference on 3D Vision. Washington D. C., USA: IEEE Press, 2021: 145-155.
7	ZHOU Q J, SATTLER T, POLLEFEYS M, et al. To learn or not to learn: visual localization from essential matrices[C]//Proceedings of IEEE International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 2020: 3319-3326.
8	SHOTTON J, GLOCKER B, ZACH C, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2013: 2930-2937.
9	BRACHMANN E, KRULL A, NOWOZIN S, et al. DSAC: differentiable RANSAC for camera localization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6684-6692.
10	BRACHMANN E, ROTHER C. Learning less is more-6D camera localization via 3D surface regression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 4654-4662.
11	BRACHMANN E, ROTHER C. Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5847- 5865.
12	ZHOU L, LUO Z X, SHEN T W, et al. KFNet: learning temporal camera relocalization using Kalman filtering[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 4919-4928.
13	LI X T, WANG S Z, ZHAO Y, et al. Hierarchical scene coordinate classification and regression for visual localization[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 11983-11992.
14	XIE T, DAI K, WANG K, et al. A deep feature aggregation network for accurate indoor camera localization. IEEE Robotics and Automation Letters, 2022, 7(2): 3687- 3694. doi: 10.1109/LRA.2022.3146946
15	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
16	孔韦韦, 田乔鑫, 滕金保, 等. 融合注意力机制的混合神经网络文本情感分析模型. 电讯技术, 2023, 63(6): 781- 789.
	KONG W W, TIAN Q X, TENG J B, et al. A hybrid neural network text sentiment analysis model with attention mechanism. Telecommunication Engineering, 2023, 63(6): 781- 789.
17	XIE T, WANG K, LI R F, et al. PANet: a pixel-level attention network for 6D pose estimation with embedding vector features. IEEE Robotics and Automation Letters, 2022, 7(2): 1840- 1847. doi: 10.1109/LRA.2021.3136873
18	LIN S F, WANG Z R, LING Y G, et al. E2EK: end-to-end regression network based on keypoint for 6D pose estimation. IEEE Robotics and Automation Letters, 2022, 7(3): 6526- 6533. doi: 10.1109/LRA.2022.3174261
19	SCHONBERGER J L, FRAHM J M. Structure-from-motion revisited[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 4104-4113.
20	SHEN S H. Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Transactions on Image Processing, 2013, 22(5): 1901- 1914. doi: 10.1109/TIP.2013.2237921
21	DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: self-supervised interest point detection and description[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 224-236.
22	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
23	LOWE D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91- 110. doi: 10.1023/B:VISI.0000029664.99615.94
24	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]//Proceedings of International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2011: 2564-2571.
25	DUSMANU M, ROCCO I, PAJDLA T, et al. D2-Net: a trainable CNN for joint detection and description of local features[EB/OL]. [2023-09-01]. https://arxiv.org/pdf/1905.03561.
26	ZHAO X M, WU X M, MIAO J Y, et al. ALIKE: accurate and lightweight keypoint detection and descriptor extraction[EB/OL]. [2023-09-01]. https://arxiv.org/pdf/2112.02906.
27	HE J F, GAO Y, ZHANG T Z, et al. D2Former: jointly learning hierarchical detectors and contextual descriptors via agent-based transformers[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 2904-2914.
28	PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]//Proceedings of NIPS'19. Cambridge, USA: MIT Press, 2019: 32-40.
29	LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2023-09-01]. https://arxiv.org/pdf/1711.05101.
30	IZADI S, KIM D, HILLIGES O, et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera[C]//Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. New York, USA: ACM Press, 2011: 559-568.
31	LIU R, LEHMAN J, MOLINA P, et al. An intriguing failing of convolutional neural networks and the coordconv solution[C]//Proceedings of NIPS'18. Cambridge, USA: MIT Press, 2018: 256-267.
32	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-09-01]. https://arxiv.org/pdf/1409.1556.
33	ZHANG J, SINGH S. LOAM: lidar odometry and mapping in real-time. Robotics, 2014, 2(9): 1- 9.

[1]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[2]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[3]	张鲁, 田春伟, 宋焕生, 刘侍刚. 用于低剂量CT图像去噪的多级双树复小波网络[J]. 计算机工程, 2024, 50(9): 266-275.
[4]	高煜宝, 文志诚. 基于注意力机制的双路解码器图像去噪方法[J]. 计算机工程, 2024, 50(9): 324-332.
[5]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[6]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[7]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[8]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[9]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[10]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[11]	于洋, 孙芳芳, 吕华, 李扬, 王晓民. 基于多尺度时空注意力网络的微表情检测方法[J]. 计算机工程, 2024, 50(6): 228-235.
[12]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[13]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[14]	李政学, 李枝名, 彭德中, 陈杰. 基于特征对比学习和图卷积的社交网络用户分类[J]. 计算机工程, 2024, 50(4): 258-266.
[15]	姜百浩, 刘静, 仇大伟, 姜良. 深度学习在脊柱图像分割中的应用综述[J]. 计算机工程, 2024, 50(3): 1-15.

选择文件类型/文献管理软件名称

选择包含的内容