Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (10): 97-110. doi: 10.19678/j.issn.1000-3428.0069738

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Geometry Relationship-aware Representation Learning Model for Human Pose Estimation in Teaching Scenarios

LIU Hai1,2, ZHU Junyan1, ZHANG Zhaoli1,*(), ZHOU Qiyun1, SONG Yunxiao1   

  1. 1. Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430000, Hubei, China
    2. Shenzhen Research Institute of Central China Normal University, Shenzhen 518000, Guangdong, China
  • Received:2024-04-12 Revised:2024-06-22 Online:2025-10-15 Published:2025-10-29
  • Contact: ZHANG Zhaoli

教学场景下基于几何关系感知的人体姿态估计表示学习模型

刘海1,2, 朱俊艳1, 张昭理1,*(), 周启云1, 宋云霄1   

  1. 1. 华中师范大学人工智能教育学部, 湖北 武汉 430000
    2. 华中师范大学深圳研究院, 广东 深圳 518000
  • 通讯作者: 张昭理
  • 基金资助:
    科技部2021年度"社会治理与智慧社会科技支撑"重点专项(2021YFC3340802); 国家自然科学基金(6247077114); 国家自然科学基金(62377037); 国家自然科学基金(62277041); 江西省自然科学基金(20242BAB2S107); 江西省自然科学基金(20232BAB212026); 江西省高校教学改革研究项目(JXJG-23-27-6); 深圳市自然科学基金面上项目(JCYJ20230807152900001); 湖北省自然科学基金创新发展联合基金项目(2025AFD621); 广东省基础与应用基础研究基金(2025A1515010266); 2024年度湖北省教育厅科学技术研究计划项目(B2023300); 华中师范大学中央高校基本科研业务费专项资金(CCNU25ai012)

Abstract:

Human Pose Estimation (HPE) is an important research task in the field of computer vision and is widely used in teaching scenarios. Currently, this task faces many challenges, such as reduced accuracy in complex scenarios, including cluttered backgrounds, small human body image scales, and occluded human bodies. Simultaneously, the flexibility and variability of human body postures require the model to have a good reasoning ability. This study proposes a geometric relationship-aware human pose representation learning model to address these problems. It uses the structured information of the human body to help the model better understand the relationship between different poses, thereby improving the accuracy and robustness of complex pose predictions to achieve effective application in classroom scenarios. The model includes four modules: channel reweighting, multi-token information interaction, limb direction construction, and adaptive loss propagation. The limb direction construction module implements the modeling of the geometric structure between the human body joints. This input clue helps the model capture the relative position and direction relationship between body parts. The channel reweighting module automatically selects and emphasizes the most helpful feature information for the pose estimation task, improving the expression ability of the visual features of the input image. The multi-token information interaction module, which is based on the Transformer encoder, realizes efficient interactions among image feature clues, joint coordinate clues, and limb direction cues. Finally, this study optimizes the traditional loss function in the adaptive loss propagation module to further improve the training effect and performance of the model. The model achieves accuracy rates of 76.1% and 90.3% on two mainstream datasets, COCO and MPII, respectively, outperforming some existing SOTA (State of the Art) models. The proposed model achieves more accurate and reasonable prediction results in complex scenarios.

Key words: Human Pose Estimation (HPE), geometry structure cue, limb direction, Transformer, image understanding

摘要:

人体姿态估计(HPE)任务是计算机视觉领域中的一项重要研究工作, 它在教学场景下有着广泛应用。当前该任务仍然面临着许多挑战, 例如在背景杂乱、人体图像尺度小、人体被遮挡等复杂场景下出现准确率下降的问题, 与此同时, 人体姿态的灵活多变性则要求模型具有良好的推理预测能力。针对上述问题, 提出一种几何关系感知的人体姿态表示学习模型, 通过人体的结构化信息来帮助模型更好地理解不同姿态之间的关系, 从而提高对复杂姿势预测的准确性和鲁棒性, 实现其在课堂场景下的有效应用。该模型主要包括通道重加权、多token信息交互、肢体方向构建和自适应损失传播4个模块。肢体方向构建模块实现了对人体关节之间几何结构的建模, 这一输入线索有利于模型捕捉到身体部位之间的相对位置和方向关系; 通道重加权模块能够自动选择和强调对姿态估计任务最有帮助的特征信息, 提升输入图像的视觉特征的表达能力; 基于Transformer编码器的多token信息交互模块实现了图像特征线索、关节坐标线索和肢体方向线索之间的有效交互; 最后, 在自适应损失传播模块对传统的损失函数进行优化, 进一步提高了模型的训练效果和性能。模型在2个主流数据集COCO和MPII上分别达到了76.1%、90.3%的准确率, 超过了现有的一些SOTA(State of the Art)模型, 在复杂场景下实现了更加准确合理的预测结果。

关键词: 人体姿态估计, 几何结构线索, 肢体方向, Transformer, 图像理解