基于深度学习的二维人体姿态估计研究进展

doi:10.19678/j.issn.1000-3428.0058799

摘要/Abstract

摘要： 基于深度学习的二维人体姿态估计方法通过构建特定的神经网络架构，将提取的特征信息根据相应的特征融合方法进行信息关联处理，最终获得人体姿态估计结果，因其具有广泛的应用价值而受到研究人员的关注。从数据集基准、姿态估计方法和评测标准等方面，对近年来基于深度学习的二维人体姿态估计的诸多研究工作进行系统归纳与整理，将现有方法分为单人姿态估计方法与多人姿态估计方法，并分别从网络架构设计、输出特征表示和损失函数选取方面进行分析与总结。在此基础上，结合当前二维人体姿态估计所面临的挑战对其未来研究发展方向与应用前景进行展望。

关键词: 二维人体姿态估计, 计算机视觉, 关键点检测, 深度学习, 卷积神经网络

Abstract: The two-dimensional Human Pose Estimation(HPE) methods based on deep learning have attracted much attention for their application potential.The methods work by constructing a specific neural network architecture,and processing the extracted feature information based on the corresponding feature fusion method and information association strategy to obtain the human pose estimation result.This paper systematically summarizes the studies on two-dimensional human pose estimation based on deep learning in recent years,categorizing them into data set benchmarks, pose estimation methods and evaluation standards.The existing methods are divided into single-person pose estimation methods and multi-person pose estimation methods,and analyzed and summarized in terms of network architecture design,output feature representation and loss function selection.Finally,based on the current challenges,this paper discusses the development directions of future research and application prospects of two-dimensional human pose estimation.

Key words: two-dimensional Human Pose Estimation(HPE), computer version, key-point detection, deep learning, Convolutional Neural Network(CNN)

中图分类号:

TP391

刘勇, 李杰, 张建林, 徐智勇, 魏宇星. 基于深度学习的二维人体姿态估计研究进展[J]. 计算机工程, 2021, 47(3): 1-16.

LIU Yong, LI Jie, ZHANG Jianlin, XU Zhiyong, WEI Yuxing. Research Progress of Two-Dimensional Human Pose Estimation Based on Deep Learning[J]. Computer Engineering, 2021, 47(3): 1-16.

https://www.ecice06.com/CN/Y2021/V47/I3/1

图/表 15

20210322170017

20210322170021

20210322170025

20210322170030

20210322170034

20210322170038

20210322170041

20210322170044

20210322170047

20210322170051

20210322170101

20210322170105

20210322170108

20210322170112

20210322170117

参考文献

[1] LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[2] DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//Proceedings of International Conference on Computer Vision & Pattern Recognition.[S.1.]:IEEE Computer Society,2005:886-893.
[3] HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[EB/OL].[2020-05-20].https://arxiv.org/abs/1503.02531.
[4] ANDRILUKA M,ROTH S,SCHIELE B.Pictorial structures revisited:people detection and articulated pose estimation[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2009:1014-1021.
[5] JOHNSON S,EVERINGHAM M.Clustered pose and nonlinear appearance models for human pose estimation[C]//Proceedings of British Machine Vision Conference.[S.1.]:BMVA Press,2010:1-11.
[6] SAPP B,TASKAR B.MODEC:multimodal decomposable models for human pose estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2013:3674-3681.
[7] ANDRILUKA M,PISHCHULIN L,GEHLER P,et al.2D human pose estimation:new benchmark and state of the art analysis[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2014:3686-3693.
[8] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2014:740-755.
[9] WU Jiahong,ZHENG He,ZHAO Bo,et al.AI challenger:a large-scale dataset for going deeper in image understanding[EB/OL].[2020-05-20].https://arxiv.org/pdf/1711.06475.pdf.
[10] LI Jiefeng,WANG Can,ZHU Hao,et al.CrowdPose:efficient crowded scenes pose estimation and a new benchmark[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:10863-10872.
[11] TOSHEV A,SZEGEDY C.DeepPose:human pose estimation via deep neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2014:1653-1660.
[12] WEI S E,RAMAKRISHNA V,KANADE T,et al.Convolutional pose machines[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4724-4732.
[13] NEWELL A,YANG K,DENG J.Stacked hourglass networks for human pose estimation[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2016:483-499.
[14] CHEN Yu,SHEN Chunhua,WEI Xiusheng,et al.Adversarial PoseNet:a structure-aware convolutional network for human pose estimation[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:1212-1221.
[15] XIAO Bin,WU Haiping,WEI Yichen.Simple baselines for human pose estimation and tracking[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:466-481.
[16] SUN Ke,XIAO Bin,LIU Dong,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:5693-5703.
[17] HE K,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press, 2017:2961-2969.
[18] CHEN Yilun,WANG Zhicheng,PENG Yuxiang,et al.Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:7103-7112.
[19] LI Wenbo,WANG Zhicheng,YIN Binyi,et al.Rethinking on multi-stage networks for human pose estimation[EB/OL].[2020-05-20].https://arxiv.org/abs/1901.00148.
[20] FANG Haoshu,XIE Shuqin,TAI Yuying,et al.RMPE:regional multi-person pose estimation[EB/OL].[2020-05-20].https://arxiv.org/abs/1612.00137v3.
[21] SUN Ke,LAN Cuiling,XIANG Junling,et al.Human pose estimation using global and local normalization[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:5599-5607.
[22] PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards accurate multi-person pose estimation in the wild[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4903-4911.
[23] PISHCHULIN L,INSAFUTDINOV E,TANG S,et al.DeepCut:joint subset partition and labeling for multi person pose estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4929-4937.
[24] INSAFUTDINOV E,PISHCHULIN L,ANDRES B,et al.DeeperCut:a deeper,stronger,and faster multi-person pose estimation model[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2016:34-50.
[25] CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:7291-7299.
[26] KREISS S,BERTONI L,ALAHI A.PifPaf:composite fields for human pose estimation[EB/OL].[2020-05-20].https://arxiv.org/pdf/1903.06593.pdf.
[27] NEWELL A,HUANG Z,DENG J.Associative embedding:end-to-end learning for joint detection and grouping[EB/OL].[2020-05-20].https://arxiv.org/abs/1611.05424.
[28] NIE Xuecheng,ZHANG Jianfeng.Single-stage multi-person pose machines[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2019:6951-6960.
[29] TANG Wei,YU Pei,WU Ying.Deeply learned compositional models for human pose estimation[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:190-206.
[30] TIAN Zhi,CHEN Hao,SHEN Chunhua.DirectPose:direct end-to-end multi-person pose estimation[EB/OL].[2020-05-20].https://arxiv.org/pdf/1911.07451.pdf.
[31] TOMSSON J J,JAIN A,LECUN Y,et al.Joint training of a convolutional network and a graphical model for human pose estimation[EB/OL].[2020-05-20].https://arxiv.org/abs/1406.2984v1.
[32] YANG Wei,LI Shuang,OUYANG Wanli,et al.Learning feature pyramids for human pose estimation[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:1281-1290.
[33] CHU Xiao,YANG Wei,OUYANG Wanli,et al.Multi-context attention for human pose estimation[EB/OL].[2020-05-20].https://arxiv.org/abs/1702.07432v1.
[34] SEKⅡ T.Pose proposal networks[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:342-357.
[35] KE L,CHANG M C,QI H,et al.Multi-scale structure-aware network for human pose estimation[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:713-728.
[36] PAPANDREOU G,ZHU T,CHEN L C,et al.PersonLab:person pose estimation and instance segmentation with a bottom-up,part-based,geometric embedding model[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:269-286.
[37] QI T,BAYRAMLI B,ALI U,et al.Spatial shortcut network for human pose estimation[EB/OL].[2020-05-20].https://arxiv.org/abs/1904.03141.
[38] ZHANG Feng,ZHU Xiatian,YE Mao.Fast human pose estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:3517-3526.
[39] ZHANG Feng,ZHU Xiatian,DAI Haibin,et al.Distribution-aware coordinate representation for human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:7093-7102.
[40] LI S Z.Markov random field modeling in image analysis[M].New York,USA:Springer-Verlag New York,Inc.,2009.
[41] LUO W,LI Y,URTASUN R,et al.Understanding the effective receptive field in deep convolutional neural networks[C]//Proceedings of IEEE ANIPS'16.Washington D.C.,USA:IEEE Press,2016:4898-4906.
[42] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-05-20].https://arxiv.org/abs/1409.1556v1.
[43] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[44] RONNEBERGER O,FISCHER P,BROX T.U-net:convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention.Berlin,Germany:Springer,2015:234-241.
[45] CHENG Bowen,XIAO Bin,WANG Jingdong,et al.HigherHRNet:scale-aware representation learning for bottom-up human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:5386-5395.
[46] CHOU C J,CHIEN J T,CHEN H T.Self adversarial training for human pose estimation[C]//Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Auckland,New Zealand:[s.n.],2018:17-30.
[47] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of IEEE ANIPS'15.Washington D.C.,USA:IEEE Press,2015:91-99.
[48] SU Kai,YU Dongdong,XU Zhenqi,et al.Multi-person pose estimation with enhanced channel-wise and spatial information[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:5674-5682.
[49] LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2016:21-37.
[50] JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatial transformer networks[C]//Proceedings of IEEE ANIPS'15.Washington D.C.,USA:IEEE Press,2015:2017-2025.
[51] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:779-788.
[52] KUHN H W.The hungarian method for the assignment problem[J].Naval Research Logistics,1955,2(1/2):83-97.
[53] HIDALGO G,RAAJ Y,IDREES H,et al.Single-network whole-body pose estimation[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2019:6982-6991.
[54] TIAN Zhi,SHEN Chunhua,CHEN Hao,et al.FCOS:fully convolutional one-stage object detection[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2019:9627-9636.
[55] SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:761-769.
[56] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:2980-2988.
[57] EICHNER M,MARIN-JIMENEZ M,ZISSERMAN A,et al.2D articulated human pose estimation and retrieval in (almost) unconstrained still images[J].International Journal of Computer Vision,2012,99(2):190-214.
[58] YAN Hang,CHEN Gang,TONG Yao,et al.Rehabilitation action recognition based on pose estimation and GRU network[J].Computer Engineering,2021,47(1):12-20.(in Chinese)闫航,陈刚,佟瑶,等.基于姿态估计与GRU网络的人体康复动作识别[J].计算机工程,2021,47(1):12-20.
[59] HU Jianghao,WANG Hongyu,QIAO Wenchao,et al.Lightweight network based real-time human pose estimation method[J/OL].Computer Engineering:1-11[2020-05-20].https://doi,org/10.19678/j.issn.1000-3428.0057041.(in Chinese)胡江颢,王红雨,乔文超.基于轻量级网络的实时人体姿态估计方法[J/OL].计算机工程:1-11[2020-05-20].https://doi,org/10.19678/j.issn.1000-3428.0057041.
[60] LIU C,CHEN L C,SCHROFF F,et al.Auto-DeepLab:hierarchical neural architecture search for semantic image segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:82-92.
[61] ZHOU Xingyi,HUANG Qixiang,SUN Xiao,et al.Towards 3D human pose estimation in the wild:a weakly-supervised approach[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:398-407.
[62] VELICKOVIC P,FEDUS W,HAMILYON W L,et al.Deep graph infomax[EB/OL].[2020-05-20].https://www.researchgate.net/publication/327930080.

选择文件类型/文献管理软件名称

选择包含的内容