1 |
XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 472-487.
|
2 |
SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 5686-5696.
URL
|
3 |
LIU S G , LI Y , HUA G G . Human pose estimation in video via structured space learning and halfway temporal evaluation. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29 (7): 2029- 2038.
doi: 10.1109/TCSVT.2018.2858828
|
4 |
NIE X C , FENG J S , XING J L , et al. Hierarchical contextual refinement networks for human pose estimation. IEEE Transactions on Image Processing, 2019, 28 (2): 924- 936.
doi: 10.1109/TIP.2018.2872628
|
5 |
ZHANG J , CHEN Z , TAO D C . Towards high performance human keypoint detection. International Journal of Computer Vision, 2021, 129 (9): 2639- 2662.
doi: 10.1007/s11263-021-01482-8
|
6 |
CAI Y H, WANG Z C, LUO Z X, et al. Learning delicate local representations for multi-person pose estimation[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 455-472.
URL
|
7 |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2022-06-10]. https://arxiv.org/abs/2010.11929.
|
8 |
YANG S, QUAN Z B, NIE M, et al. TransPose: keypoint localization via Transformer[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2022: 11782-11792.
URL
|
9 |
LI Y J, ZHANG S K, WANG Z C, et al. TokenPose: learning keypoint tokens for human pose estimation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2022: 11293-11302.
URL
|
10 |
WANG J D , SUN K , CHENG T H , et al. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (10): 3349- 3364.
doi: 10.1109/TPAMI.2020.2983686
|
11 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
|
12 |
YUAN Y, FU R, HUANG L, et al. HRFormer: high-resolution vision Transformer for dense predict[C]//Proceedings of Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 7281-7293.
|
13 |
|
14 |
CHEN C F R, FAN Q F, PANDA R. CrossViT: cross-attention multi-scale vision Transformer for image classification[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2022: 347-356.
URL
|
15 |
LEE K H, CHEN X, HUA G, et al. Stacked cross attention for image-text matching[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 212-228.
|
16 |
WEI X, ZHANG T Z, LI Y, et al. Multi-modality cross attention network for image and sentence matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 10938-10947.
URL
|
17 |
KIM H, BANSAL M. Improving visual question answering by referring to generated paragraph captions[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Washington D.C., USA: IEEE Press, 2019: 3606-3612.
|
18 |
LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2017: 936-944.
|
19 |
YANG W, LI S, OUYANG W L, et al. Learning feature Pyramids for human pose estimation[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2017: 1290-1299.
URL
|
20 |
CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 7103-7112.
URL
|
21 |
HE J J, DENG Z Y, QIAO Y. Dynamic multi-scale filters for semantic segmentation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2020: 3561-3571.
URL
|
22 |
TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 10778-10787.
URL
|
23 |
GAO S H , CHENG M M , ZHAO K , et al. Res2Net: a new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (2): 652- 662.
doi: 10.1109/TPAMI.2019.2938758
|
24 |
LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2022: 9992-10002.
URL
|
25 |
HEO B, YUN S, HAN D, et al. Rethinking spatial dimensions of vision Transformers[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2022: 11916-11925.
URL
|
26 |
|
27 |
WANG W H, XIE E Z, LI X, et al. Pyramid vision Transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2022: 548-558.
URL
|
28 |
NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 483-499.
|
29 |
KE L P, CHANG M C, QI H G, et al. Multi-scale structure-aware network for human pose estimation[C]//Proceedings of the 15th European Conference on Computer Vision Computer Vision. Berlin, Germany: Springer, 2018: 731-746.
|
30 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2016: 770-778.
|
31 |
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 740-755.
|
32 |
GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2021: 14671-14681.
URL
|
33 |
ZHANG F, ZHU X T, DAI H B, et al. Distribution-aware coordinate representation for human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 7091-7100.
URL
|
34 |
ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: new benchmark and state of the art analysis[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2014: 3686-3693.
URL
|
35 |
|
36 |
ZHANG F, ZHU X T, YE M. Fast human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 3512-3521.
URL
|