作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 263-270. doi: 10.19678/j.issn.1000-3428.0063699

• 开发研究与工程应用 • 上一篇    下一篇

基于改进堆叠沙漏网络的动物姿态估计

张雯雯1, 徐杨1,2, 白芮1, 陈娜1   

  1. 1. 贵州大学 大数据与信息工程学院, 贵阳 550025;
    2. 贵阳铝镁设计研究院有限公司, 贵阳 550009
  • 收稿日期:2022-01-05 修回日期:2022-03-12 发布日期:2022-07-05
  • 作者简介:张雯雯(1997-),女,硕士研究生,主研方向为计算机视觉、机器学习;徐杨(通信作者),副教授、博士;白芮、陈娜,硕士研究生。
  • 基金资助:
    贵州省科技计划(黔科合支撑[2021]一般176)。

Animal Pose Estimation Based on Improved Stacked Hourglass Network

ZHANG Wenwen1, XU Yang1,2, BAI Rui1, CHEN Na1   

  1. 1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China;
    2. Guiyang Aluminum-magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Received:2022-01-05 Revised:2022-03-12 Published:2022-07-05

摘要: 堆叠沙漏网络在动物姿态估计任务中得到成功应用,但其编码-解码过程容易丢失网络的浅层信息,影响了检测精度。针对该问题,提出基于改进堆叠沙漏网络的动物姿态估计模型。设计一种基于SE注意力机制的多尺度最大池化模块,完成多尺度信息的提取,解决池化后信息大量丢失的问题,提高网络获取全局信息的能力,同时提出多级特征融合方法,充分提取和融合特征信息。在此基础上,嵌入CBAM注意力机制,学习特征融合权重,提升网络对多通道信息的提取能力,抑制无效特征,使网络提取出更丰富、细腻的特征。在TigDog数据集和合成动物数据集上进行训练和测试,结果表明,该模型估计性能优于Syn、BDL、CyCADA和CC-SSL模型,其对于马和老虎的PCK@0.05指标较次优的CC-SSL模型分别提高4.6%和3.5%。消融实验结果也验证了整个网络体系结构的先进性和有效性。

关键词: 动物姿态估计, 堆叠沙漏网络, 多尺度信息提取, 注意力机制, 特征融合

Abstract: The Stacked Hourglass Network(SHN) has been successfully applied in animal pose estimation tasks, but it easily loses the shallow information of the network during the encode-decode process, which affects the detection accuracy.To address this problem, an animal pose estimation model based on improved SHN is proposed. A multi-scale maximum pooling module based on Squeeze-and-Excitation(SE) attention is designed to complete the extraction of multiscale information, address the large loss of information after pooling, and improve the ability of the network to obtain global information.Simultaneously, a multilevel feature fusion method is proposed to fully extract and fuse the feature information.Based on the Convolutional Block Attention Module(CBAM), an attention mechanism is embedded to learn feature fusion weights, improve the network's ability to extract multichannel information, suppress invalid features, and make the network extract richer and more detailed features.The proposed model is trained and tested on the TigDog and synthetic animal datasets.The results show that the proposed model outperforms the Syn, BDL, CyCADA, and CC-SSL models;its PCK@0.05 index for horses and tigers is 4.6% and 3.5% higher than that of the suboptimal CC-SSL model, respectively.The experimental ablation results also verify the advancement and effectiveness of the entire network architecture.

Key words: animal pose estimation, Stacked Hourglass Network(SHN), multi-scale information extraction, attention mechanism, feature fusion

中图分类号: