Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Graph Convolution and Learnable Query-Based Bone Length Correction for 3D Human Pose Estimation

  

  • Published:2026-01-13

图卷积与可学习查询的骨骼长度校正三维人体姿态估计

Abstract: To address the challenges of large errors in 2D keypoint detection and the insufficient ability to model the spatial structural relationships among different joints, we propose a 3D human pose estimation model based on Graph Convolutional Cross-Fusion Attention (GCCFA). The model first introduces a GCN module to capture skeletal topology information from both local and global perspectives, thereby enhancing the structural constraints and representation capability among joints. Then, a learnable query fusion module is incorporated to dynamically select and fuse keypoint features through cross-attention, improving feature discriminability and robustness. Finally, a Transformer-based bone length correction post-processing method is proposed to adaptively learn the distribution of bone lengths from training data, refining the initial 3D estimations and effectively mitigating pose deviations caused by 2D detection errors. Experiments on the Human3.6M dataset demonstrate that, after bone length correction, our model achieves a P1 error of 38.4 mm and a P2 error of 30.4 mm, reaching state-of-the-art performance. Additional evaluations on the MPI-INF-3DHP dataset further verify the effectiveness of the proposed method.

摘要: 针对2D关键点检测误差较大以及不同关节点的空间结构关系建模能力不足的问题,提出了基于图卷积(GCN)和时空交叉融合注意力(Graph Convolutional Cross-Fusion Attention, GCCFA)的3D人体姿态估计模型。该模型首先设计GCN模块,从局部与全局两个层面建模人体骨架的拓扑信息,增强关节点之间的结构约束与表达能力。然后引入可学习查询融合模块,通过交叉注意力实现关键点特征的动态选择与融合,增强特征的区分性与鲁棒性。最后提出基于Transformer的骨骼长度校正后处理方法,通过自适应学习训练数据中的骨骼长度分布,优化初始3D估计结果,有效缓解由2D检测误差带来的姿态估计偏差。在Human3.6M数据集上的实验表明,骨骼长度校正后,模型的P1误差为38.4 mm,P2误差为30.4 mm,达到当前先进水平。在MPI-INF-3DHP数据集上进行实验进一步验证了方法的有效性。