[1] SONG L C, YU G, YUAN J S, et al. Human pose estimation and its application to action recognition:a survey[J]. Journal of Visual Communication and Image Representation, 2021, 76:103055. [2] HE K M, GKIOXARI G, DOLLR P, et al. Mask R-CNN[C]//Proceedings of the International Conference on Computer Vision. Washington D. C., USA:IEEE Press, 2017:2961-2969. [3] XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[EB/OL].[2023-04-21]. https://arxiv.org/pdf/1804.06208.pdf. [4] KHIRODKAR R, CHARI V, AGRAWAL A, et al. Multi-instance pose networks:rethinking top-down pose estimation[C]//Proceedings of the International Conference on Computer Vision. Washington D. C., USA:IEEE Press, 2021:3122-3131. [5] CAO Z, GINES H, SIMON T, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press,2017:7291-7299. [6] KREISS S, BERTONI L, ALAHI A. PifPaf:composite fields for human pose estimation[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2019:11977-11986. [7] CHENG B W, XIAO B, WANG J D, et al. HigherHRNet:scale-aware representation learning for bottom-up human pose estimation[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2020:5386-5395. [8] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[EB/OL].[2023-04-21]. https://arxiv.org/pdf/1902.09212.pdf. [9] WANG D, ZHANG S. Contextual instance decoupling for robust multi-person pose estimation[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2022:11060-11068. [10] CHENG Y, AI Y H, WANG B, et al. Bottom-up 2D pose estimation via dual anatomical centers for small-scale persons[J]. Pattern Recognition, 2023, 139:109403. [11] LIU S G, LI Y, HUA G G. Human pose estimation in video via structured space learning and halfway temporal evaluation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(7):2029-2038. [12] DU S L, WANG H, YUAN Z W, et al. Bi-Pose:bidirectional 2D-3D Transformation for human pose estimation from a monocular camera[J/OL]. IEEE Transactions on Automation Science and Engineering:1-14[2023-04-21]. https://ieeexplore.ieee.org/document/10141872. [13] LI J N, LIANG X D, WEI Y C, et al. Perceptual generative adversarial networks for small object detection[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2017:1222-1230. [14] KISANTAL M, WOJNA Z, MURAWSKI J, et al. Augmentation for small object detection[EB/OL].[2023-04-21]. https://arxiv.org/pdf/1902.07296.pdf. [15] LIN T Y, DOLLR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press,2017:2117-2125. [16] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[EB/OL].[2023-04-21]. https://arxiv.org/pdf/1803.01534.pdf. [17] HU H, GU J Y, ZHANG Z, et al. Relation networks for object detection[EB/OL].[2023-04-21]. https://arxiv.org/pdf/1711.11575.pdf. [18] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. New York, USA:ACM Press, 2015:2017-2025. [19] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA:ACM Press, 2017:6000-6010. [20] LI Y J, ZHANG S K, WANG Z C, et al. TokenPose:learning keypoint Tokens for human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA:IEEE Press, 2021:11313-11322. [21] SHI D, WEI X, LI L, et al. End-to-end multi-person pose estimation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2022:11069-11078. [22] 刘豪,吴红兰,房宇轩.结合全局上下文信息的高效人体姿态估计[J].计算机工程, 2023,49(7):102-109, 117. LIU H,WU H L,FANG Y X. Efficient human pose estimation combining global contextual information[J]. Computer Engineering, 2023,49(7):102-109, 117.(in Chinese) [23] 王款,宣士斌,何雪东,等.基于交叉注意力变换器的人体姿态估计算法[J/OL].计算机工程:1-10[2023-06-27].DOI:10.19678/j.issn.1000-3428.0065330. WANG K,XUAN S B, HE X D, et al. Cross attention transformer for human pose estimation[J/OL]. Computer Engineering:1-10[2023-06-27].DOI:10.19678/j.issn.1000-3428.0065330.(in Chinese) [24] ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Objects as points[EB/OL].[2023-04-21]. https://arxiv.org/abs/1904.07850v1. [25] SOFIIUK K, BARINOVA O, KONUSHIN A. AdaptIS:adaptive instance selection network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA:IEEE Press, 2019:7355-7363. [26] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2):318-327. [27] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany:Springer, 2020:213-229. [28] DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[EB/OL].[2023-04-21]. https://arxiv.org/pdf/1703.06211.pdf. [29] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[EB/OL].[2023-04-21]. https://arxiv.org/abs/1911.05722v3. [30] YU Q, WANG H K, QIAO S, et al. K-means mask transformer[C]//Proceedings of the 17th European Conference on Computer Vision. Berlin, Germany:Springer, 2022:288-307. [31] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:Transformers for image recognition at scale[EB/OL].[2023-04-21]. https://arxiv.org/abs/2010.11929v1. [32] LUO Z X, WANG Z C, HUANG Y, et al. Rethinking the heatmap regression for bottom-up human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2021:13264-13273. [33] GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA:IEEE Press, 2021:14676-14686. [34] XUE N, WU T F, XIA G S, et al. Learning local-global contextual adaptation for multi-person pose estimation[EB/OL].[2023-04-21]. https://arxiv.org/abs/2109.03622v2. |