[1] Liu Xin, Shi Henglin, Chen Haoyu, et al. imigue: An
identity-free
video
dataset
for
micro-gesture
understanding and emotion analysis[C]//Proceedings actio
the IEEE/CVF conference on computer vision and pattern
recognition. 2021: 10631-10642.
[2] Aviezer H, Trope Y, Todorov A. Body cues, not facial
expressions, discriminate between intense positive and
negative emotions[J]. Science, 2012, 338(6111):
1225-1229.
[3] Axtell R E, Fornwald M. Gestures: The do's and taboos of
body language around the world[J]. (No Title), 1998.
[4] 王园园,曹慧,王廷蔚.基于多粒度交叉注意力的骨架动
程,1-10[2025-07-15].https://doi.org/10.19678/j.issn.1000
3428.0252088.
Wang Y, Cao H, Wang T. Skeleton-based action
recognition method based on multi-granularity cross
attention
[J/OL].
Computer
Engineering,
1-10
[2025-07-15].https://doi.org/10.19678/j.issn.1000-3428.02
52088.
[5] Yan Sijie, Xiong Yuanjun, Lin Dahua. Spatial temporal
graph convolutional networks for skeleton-based action
recognition[C]//Proceedings of the AAAI conference on
artificial intelligence. 2018, 32(1).
[6] Liu Ziyu, Zhang Hongwen, Chen Zhenghao, et al.
Disentangling and unifying graph convolutions for
skeleton-based action recognition[C]//Proceedings of the
IEEE/CVF conference on computer vision and pattern
recognition. 2020: 143-152.
[7] Zhou Bolei, Andonian A, Oliva A, et al. Temporal
relational reasoning in videos[C]//Proceedings of the
European conference on computer vision (ECCV). 2018:
803-818.
[8] Lin Ji, Gan Chuang, Han Song. Tsm: Temporal shift
module for efficient video understanding[C]//Proceedings
of the IEEE/CVF international conference on computer
vision. 2019: 7083-7093.
[9] Zhou Yuxuan, Cheng Zhiqi, Li Chao, et al. Hypergraph
transformer for skeleton-based action recognition[J].
arXiv preprint arXiv:2211.09590, 2022.
[10] Huang Hexiang, Guo XuPeng, Huang Weipeng, et al.
Micro-gesture Classification Based on Ensemble
Hypergraph-convolution Transformer[C]//MiGA@ IJCAI.
2023.
[11] Das S, Sharma S, Dai R, et al. Vpn: Learning video-pose
embedding for activities of daily living[C]//European
conference on computer vision. Cham: Springer
International Publishing, 2020: 72-90.
[12] Ahn D, Kim S, Hong H, et al. Star-transformer: a
spatio-temporal cross attention transformer for human
action recognition[C]//Proceedings of the IEEE/CVF
winter conference on applications of computer vision.
2023: 3330-3339.
[13] Radford A, Kim J W, Hall
[16] Simonyan K, Zisserman A. Two-stream convolutional
networks for action recognition in videos[J]. Advances in
neural information processing systems, 2014, 27.
[17] Karpathy A, Toderici G, Shetty S, et al. Large-scale video
classification
with
convolutional
neural
networks[C]//Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition. 2014:
1725-1732.
[18] 张聪聪,何宁,孙琪翔,等.基于注意力机制的 3D
DenseNet 人 体 动 作 识 别 方 法 [J]. 计算机工
程,2021,47(11):313-320.DOI:10.19678/j.issn.1000-3428.
0059640.
Zhang C, He N, Sun Q, et al. Human action recognition
method based on attention mechanism and 3D
DenseNet[J]. Computer Engineering, 2021, 47(11):
313-320. DOI:10.19678/j.issn.1000-3428.0059640.
[19] Carreira J, Zisserman A. Quo vadis, action recognition? a
new model and the kinetics dataset[C]//proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition. 2017: 6299-6308.
[20] Soomro K, Zamir A R, Shah M. UCF101: A dataset of
101 human actions classes from videos in the wild[J].
arXiv preprint arXiv:1212.0402, 2012.
[21] Cheng Qin,Cheng Jun Cheng, Liu Zhen, et al. A
dense-sparse complementary network for human action
recognition based on RGB and skeleton modalities[J].
Expert Systems with Applications, 2024, 244: 123061.
[22] Kim S, Ahn D, Ko B C. Cross-modal learning with 3D
deformable
attention
recognition[C]//Proceedings
for
of
the
action
IEEE/CVF
international conference on computer vision. 2023:
10265-10275.
[23] Sigurdsson G A, Gupta A, Schmid C, et al. Charades-ego:
A large-scale dataset of paired third and first person
videos[J]. arXiv preprint arXiv:1804.09626, 2018.
[24] Pan Junting, Lin Ziyi, Zhu Xiatian, et al. St-adapter:
Parameter-efficient image-to-video transfer learning[J].
Advances in Neural Information Processing Systems,
2022, 35: 26462-26477.
[25] Duan Haodong, Zhao Yue, Chen Kai, et al. Revisiting
skeleton-based action recognition[C]//Proceedings of the
IEEE/CVF conference on computer vision and pattern
recognition. 2022: 2969-2978.
[26] Wang Y, Dong Z, Li P, et al. A Multimodal Micro-gesture
Classification Model Based on CLIP[C]//MiGA@ IJCAI,
2024.
[27] Cao Kaidi, Wei Colin, Gaidon A, et al. Learning
imbalanced datasets with label-distribution-aware margin
loss[J]. Advances in neural information processing
systems, 2019, 32.
[28] Kuehne H, Jhuang H, Garrote E, et al. HMDB: a large
video database for human motion recognition[C]//2011
International conference on computer vision. IEEE, 2011:
2556-2563.
[29] Lin Ziyi, Geng Shijie, Zhang Renrui, et al. Frozen clip
models are efficient video learners[C]//European
Conference on Computer Vision. Cham: Springer Nature
Switzerland, 2022: 388-404.
[30] Li Xinhao, Zhu Yuhan, Wang Limin. Zeroi2v: Zero-cost
adaptation of pre-trained transformers from image to
video[C]//European Conference on Computer Vision.
Cham: Springer Nature Switzerland, 2024: 425-443.
[31] Xie Saining, Sun Chen, Huang Jonathan, et al. Rethinking
spatiotemporal
feature
learning:
Speed-accuracy
trade-offs in video classification[C]//Proceedings of the
European conference on computer vision (ECCV). 2018:
305-321.
[32] Ju Chen, Han Tengda, Zheng Kunhao, et al. Prompting
visual-language
models
for
efficient
video
understanding[C]//European Conference on Computer
Vision. Cham: Springer Nature Switzerland, 2022:
105-124.
[33] Zhao Zhiyu, Huang Bingkun, Xing Sen, et al.
Asymmetric masked distillation for pre-training small
foundation models[C]//Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition.
2024: 18516-18526
|