作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 22-30. doi: 10.19678/j.issn.1000-3428.0066469

• 热点与综述 • 上一篇    下一篇

基于深度学习的交互笔关键点估计研究

朱兴帅1,2, 叶彬1,2, 姚康1,2, 丁上上1,2, 徐道亮1,2, 付威威1,2,*   

  1. 1. 中国科学技术大学 生物医学工程学院(苏州), 江苏 苏州 215000
    2. 中国科学院苏州生物医学工程技术研究所, 江苏 苏州 215000
  • 收稿日期:2022-12-08 出版日期:2023-10-15 发布日期:2023-10-10
  • 通讯作者: 付威威
  • 作者简介:

    朱兴帅(1997-), 男, 硕士研究生, 主研方向为计算机视觉、图像处理

    叶彬, 硕士研究生

    姚康, 硕士研究生

    丁上上, 硕士研究生

    徐道亮, 博士研究生

  • 基金资助:
    中国科学院青年创新促进会项目(E1290301)

Research on Key Point Estimation of Interactive Pen Based on Deep Learning

Xingshuai ZHU1,2, Bin YE1,2, Kang YAO1,2, Shangshang DING1,2, Daoliang XU1,2, Weiwei FU1,2,*   

  1. 1. School of Biomedical Engineering(Suzhou), University of Science and Technology of China, Suzhou 215000, Jiangsu, China
    2. Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215000, Jiangsu, China
  • Received:2022-12-08 Online:2023-10-15 Published:2023-10-10
  • Contact: Weiwei FU

摘要:

虚拟现实技术应用领域广泛,但现有交互方式不能满足使用者精细化操作的需求。通过交互笔可实现三维空间的精确输入,提升生产力效率。设计基于单目RGB图片的两阶段交互笔关键点估计模型PKPD-Net。通过CBAM-SHN网络得到二维关键点信息,利用笔的二维姿态特征进一步回归出关键点三维位置信息。该模型使用CBAM模块改进融合方式、基于Offset的关键点亚像素定位、辅助手部关键点预测等方法,实现高精度的笔上关键点三维估计,为通过交互笔进行精细化操作提供准确的位置信息。在大规模数据集上进行实验和验证,结果表明,相较于Minimal-hand与HOPE-Net模型,该模型预测关键点的mean_EPE分别降低0.882和0.710 mm,PSF@4分别提升31.38和32.31个百分点。最后,为探索产业级应用,结合PKPD-Net进行应用开发,通过时序关联实现操作轨迹的复原。

关键词: 虚拟现实, 精细操作, 深度学习, 关键点估计, 特征融合

Abstract:

Virtual Reality(VR) is widely used in various fields. However, the need for refined operations can not be met when using existing interaction methods. Using an interactive pen can achieve an accurate input of 3D space and enhance productivity efficiency. Thus, this study proposes a two-stage key point estimation algorithm of interactive pen based on a single RGB picture, PKPD-Net. In particular, the 2D key points are first estimated using the Convolutional Block Attention Module-Stacked Hourglass Network(CBAM-SHN). Then, the location information of 3D key points is calculated based on the 2D posture characteristics. This model proposes an improved fusion method based on the CBAM modules, sub-pixel positioning of key points based on Offset, and auxiliary estimation through the key points of supplementary hands. As a result, it achieves a highly precise estimation of the 3D key points. This provides accurate location information for refined operations through interactive pens. The model training and testing are performed on numerous datasets.The PKPD-Net achieves a mean End Point Error(mean EPE) of the key points that is lower by 0.882 and 0.710 mm, compared to that obtained using Minimal-hand and HOPE-Net models, respectively. Moreover, the Percentage of Success Frame with less than 4 mm(PSF@4) of key points achieved using the proposed model is higher by 31.38 and 32.31 percentage points, respectively. Thus, the method proposed in this study proves to be more advanced and effective than the existing methods. Finally, to explore the product applications, PKPD-Net is used to recover the operating trajectory through a time-sequential association.

Key words: Virtual Reality(VR), refined operation, deep learning, key point estimation, feature fusion