[1] ZHENG C, WU W, CHEN C, et al. Deep learning-based
human pose estimation: A survey[J]. ACM Computing
Surveys, 2023, 56(1): 1-37.
[2] 冯晓月, 宋杰. 二维人体姿态估计研究进展[J]. 计算机科
学, 2020, 47(11): 128-136.
FENG X Y, SONG J. Advances in two-dimensional human
pose estimation research [J]. Computer Science, 2020, 47(11):
128-136.(in Chinese)
[3] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is
all you need[J]. Advances in neural information processing
systems, 2017, 30.
[4] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training
of deep bidirectional transformers for language
understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
( 2019-05-24 )
[2023-06-10].https://doi.org/10.48550/arXiv.1810.04805.
[5] RADFORD A, NARASIMHAN K, SALIMANS T, et al.
Improving language understanding by generativepre-training[J]. 2018.
[6] YANG A, PAN J, LIN J, et al. Chinese clip: Contrastive
vision-language pretraining in chinese[J]. arXiv preprint
arXiv:2211.01335, 2022. ( 2023-05-23 )
[2023-06-10].https://doi.org/10.48550/arXiv.2211.01335.
[7] AFKANPOUR A, ADEEL S, BASSANI H, et al. BERT for
Long Documents: A Case Study of Automated ICD
Coding[J]. arXiv preprint arXiv:2211.02519, 2022.
( 2022-11-04 )
[2023-06-10].https://doi.org/10.48550/arXiv.2211.02519.
[8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An
image is worth 16x16 words: Transformers for image
recognition at scale[J]. arXiv preprint arXiv:2010.11929,
2020. ( 2022-06-03 )
[2023-06-10].https://doi.org/10.48550/arXiv.2010.11929.
[9] XU Y, ZHANG J, ZHANG Q, et al. Vitpose: Simple vision
transformer baselines for human pose estimation[J].
Advances in Neural Information Processing Systems, 2022,
35: 38571-38584.
[10] YUAN Y, FU R, HUANG L, et al. Hrformer:
High-resolution transformer for dense prediction[J]. arXiv
preprint arXiv:2110.09408, 2021. ( 2021-11-07 )
[2023-06-10].https://doi.org/10.48550/arXiv.2110.09408.
[11] MAO W, GE Y, SHEN C, et al. Tfpose: Direct human pose
estimation with transformers[J]. arXiv preprint
arXiv:2103.15320, 2021. ( 2021-03-29 )
[2023-06-10].https://doi.org/10.48550/arXiv.2103.15320.
[12] 孙琪翔,何宁,张敬尊等.基于非局部高分辨率网络的轻量
化 人 体 姿 态 估 计 方 法 [J]. 计 算 机 应
用,2022,42(05):1398-1406.
SUN Q X, HE N, ZHANG JZ, et al. A lightweight human
pose estimation method based on nonlocal high-resolution
networks[J]. Computer
Applications,2022,42(05):1398-1406.(in Chinese)
[13] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning
transferable architectures for scalable image
recognition[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2018: 8697-8710.
[14] 胡挺,祝永新,田犁等.面向移动平台的轻量级卷积神经网
络架构[J].计算机工程,2019,45(01):17-22.
HU T, ZHU Y X, TIAN L, et al. Lightweight convolutional
neural network architecture for mobile platforms[J].
Computer Engineering,2019,45(01):17-22.(in Chinese)
[15] 高坤,李汪根,束阳等.融入密集连接的多尺度轻量级人体
姿态估计[J].计算机工程与应用,2022,58(24):196-204.
GAO K, LI W G, SHUANG Y, et al. Multi-scale lightweight
human pose estimation incorporating dense connectivity[J].
Computer Engineering and
Applications,2022,58(24):196-204.(in Chinese)
[16] HINTON G, VINYALS O, DEAN J. Distilling the
knowledge in a neural network[J]. arXiv preprint
arXiv:1503.02531, 2015. ( 2015-03-09 )
[2023-06-10].https://doi.org/10.48550/arXiv.1503.02531.
[17] ZAGORUYKO S, KOMODAKIS N. Paying more attention
to attention: Improving the performance of convolutional
neural networks via attention transfer[J]. arXiv preprint
arXiv:1612.03928, 2016. ( 2017-02-12 )
[2023-06-10].https://doi.org/10.48550/arXiv.1612.03928.
[18] HEO B, KIM J, YUN S, et al. A comprehensive overhaul of
feature distillation[C]//Proceedings of the IEEE/CVF
International Conference on Computer Vision. 2019:
1921-1930.
[19] HAN S, POOL J, TRAN J, et al. Learning both weights and
connections for efficient neural network[J]. Advances in
neural information processing systems, 2015, 28.
[20] GUO Y, YAO A, CHEN Y. Dynamic network surgery for
efficient dnns[J]. Advances in neural information processing
systems, 2016, 29.
[21] HUANG Z, WANG N. Data-driven sparse structure selection
for deep neural networks[C]//Proceedings of the European
conference on computer vision (ECCV). 2018: 304-320.
[22] LUO J H, ZHANG H, ZHOU H Y, et al. Thinet: pruning cnn
filters for a thinner net[J]. IEEE transactions on pattern
analysis and machine intelligence, 2018, 41(10): 2525-2538.
[23] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets:
Efficient convolutional neural networks for mobile vision
applications[J]. arXiv preprint arXiv:1704.04861, 2017.
( 2017-04-17 )
[2023-06-10].https://doi.org/10.48550/arXiv.1704.04861.
[24] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An
extremely efficient convolutional neural network for mobile
devices[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2018: 6848-6856.
[25] HAN K, WANG Y, TIAN Q, et al. Ghostnet: More featuresfrom cheap operations[C]//Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition. 2020:
1580-1589.
[26] YU C, XIAO B, GAO C, et al. Lite-hrnet: A lightweight
high-resolution network[C]//Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition. 2021:
10440-10450.
[27] CHENG B W, XIAO B, WANG J D, et al. HigherHRNet:
scale-aware representation learning for bottom-up human
pose estimation[C]//Proceedings of IEEE/CVF Conference
on Computer Vision and Pattern Recognition. Washington
D.C.,USA:IEEE Press,2020: 5385-5394.
[28] GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose
estimation via disentangled keypoint
regression[C]//Proceedings of IEEE/CVF Conference on
Computer Vision and Pattern Recognition. Washington
D.C.,USA:IEEE Press,2021: 14671-14681.
[29] 刘圣杰,何宁,于海港等.引入坐标注意力和自注意力的人
体关键点检测研究[J].计算机工程,2022,48(12):86-94.
LIU S J, HE N, YU H G,et al. Research on human key point
detection with coordinated attention and
self-attention[J].Computer Engineering,2022,48(12): 86-94.
(in Chinese)
[30] TSOTSOS J K. Analyzing vision at the complexity level[J].
Behavioral and brain sciences, 1990, 13(3): 423-445.
[31] TSOTSOS J K. A computational perspective on visual
attention[M]. MIT Press, 2011.
[32] HU J, SHEN L, SUN G. Squeeze-and-excitation
networks[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2018: 7132-7141.
[33] WANG Q , WU B , ZHU P , et al. ECA-Net: Efficient
Channel Attention for Deep Convolutional Neural
Networks[C]// 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR). IEEE, 2020.
[34] WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block
attention module[C]//Proceedings of the European
conference on computer vision. Berlin , Germany : Springer,
2018: 3-19.
[35] CHEN Y, KALANTIDIS Y, LI J, et al. A^ 2-nets: Double
attention networks[J]. Advances in neural information
processing systems, 2018, 31.
[36] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural
networks[C]//Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. Washington D. C. ,
USA: IEEE Press, 2018: 7794-7803.
[37] CAO Y, XU J, LIN S, et al. Gcnet: Non-local networks meet
squeeze-excitation networks and beyond[C]//Proceedings of
the IEEE/CVF International Conference on Computer Vision
Workshops. Washington D. C. , USA: IEEE Press, 2019: 0-0.
[38] LIU J J, HOU Q, CHENG M M, et al. Improving
convolutional networks with self-calibrated
convolutions[C]//Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. Washington D.
C. , USA: IEEE Press, 2020: 10096-10105.
[39] GAO Z L, XIE J T, WANG Q L, et al. Global second-order
pooling convolutional networks[C]//Proceedings of
IEEE/CVF Conference on Computer Vision and Pattern
Recognition. Washington D.C.,USA:IEEE Press,2019:
3019-3028.
[40] HUANG Z, WANG X, HUANG L, et al. Ccnet: Criss-cross
attention for semantic segmentation[C]//Proceedings of the
IEEE/CVF International Conference on Computer Vision.
Washington D. C. , USA: IEEE Press, 2019: 603-612.
[41] CHEN L C, ZHU Y, PAPANDREOU G, et al.
Encoder-decoder with atrous separable convolution for
semantic image segmentation[C]//Proceedings of the
European conference on computer vision (ECCV). 2018:
801-818.
|