[1].N. Siddiqui, P. Tirupattur, and M. Shah. DVANet: Disentangling View and Action Features for Multi-View Action Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024: 4873-4881.
[2].孟祥璞,李硕,苑明哲,等.基于人体骨架的动作识别:综述与展望[J].信息与控制,2025,54(01):1-27.
Meng Xiangpu, Li Shuo, Yuan Mingzhe, et al. Action Recognition Based on Human Skeleton: Review and Prospect [J]. Information and Control, 2020,54(01):1-27.
[3].A. Sanchez-Caballero, D. Fuentes-Jimenez, and C. Losada-Gutiérrez. Exploiting the convlstm: Human action recognition using raw depth video-based recurrent neural networks[J]. arXiv preprint arXiv:2006.07744, 2020.
[4].Y. Ben-Shabat, O. Shrout, and S. Gould. 3dinaction: Understanding human actions in 3d point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 19978-19987.
[5].吕露露,黄毅,高君宇,等.多模态零样本人体动作识别[J].中国图象图形学报,2021,26(07):1658-1667.
Multimodal Zero-shot Human Motion recognition [J]. Journal of Image and Graphics,2021,26(07):1658-1667.
[6].Z. Chen, Y. Luo, R. Qiu, S. Wang, Z. Huang, J. Li, and Z. Zhang. Semantics disentangling for generalized zero-shot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 8712-8720.
[7].Z. Han, Z. Fu, S. Chen, and J. Yang. Contrastive embedding for generalized zero-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 2371-2381.
[8].Z. Wang, J. Liang, R. He, N. Xu, Z. Wang, and T. Tan. Improving zero-shot generalization for clip with synthesized prompts[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 3032-3042.
[9].张海涛,苏琳.结合知识图谱的变分自编码器零样本图像识别[J].计算机工程与应用,2023,59(01):236-243.
Zhang Haitao, Su Lin Zero-shot Image recognition with variational autoencoder based on Knowledge Graph [J]. Computer Engineering and Applications,2023,59(01):236-243.
[10].Y. Ye, Y. He, T. Pan, J. Li, and H. T. Shen. Alleviating domain shift via discriminative learning for generalized zero-shot learning[J]. IEEE Transactions on Multimedia, 2021, 23: 1325-1337.
[11].D. Mandal, S. Narayan, S. K. Dwivedi, V. Gupta, S. Ahmed, F. S. Khan, and L. Shao. Out-of-distribution detection for generalized zero-shot action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 9985-9993.
[12].B. Ni, H. Peng, M. Chen, S. Zhang, G. Meng, J. Fu, S. Xiang, and H. Ling. Expanding language-image pretrained models for general video recognition[C]//European Conference on Computer Vision. 2022: 1-18.
[13].J. Gao, Y. Hou, Z. Guo, and H. Zheng. Learning spatio-temporal semantics and cluster relation for zero-shot action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(7): 6519-6530.
[14].L. Momeni, M. Caron, A. Nagrani, A. Zisserman, and C. Schmid. Verbs in action: Improving verb understanding in video-language models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 15579-15591.
[15].K. Cheng, Y. Zhang, C. Cao, et al. Decoupling gcn with dropgraph module for skeleton-based action recognition.[C]// European Conference on Computer Vision, 2020: 536-553.
[16].H. T. Gao, R. H. Jiang, Z. Dong, et al. Spatial-temporal-decoupled masked pre-training for spatiotemporal forecasting. [C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024: 3998-4006.
[17].J. e, Y. Meng, Y. Zhao, et al. Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition.[J] IEEE Transactions on Image Processing, 2024.
[18].H. Zheng, Y. S. Zhao, B. Zhang, et al. A separable spatial-temporal graph learning approach for skeleton-based action recognition.[J] IEEE Sensors Letters, 2024.
[19].H. Cui, R. Huang, R. Zhang, et al. Dstsa-gcn: Advancing skeleton-based gesture recognition with semantic-aware spatio-temporal topology modeling.[J] Neurocomputing, 637:130066, 2025.
[20].A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. Devise: A deep visual-semantic embedding model[C]//Advances in Neural Information Processing Systems. 2013.
[21].H. Tsai, L. Huang, and R. Salakhutdinov. Learning robust visual-semantic embeddings[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 3571-3580.
[22].B. Jasani and A. Mazagonwalla. Skeleton based zero shot action recognition in joint pose-language semantic space[J]. arXiv preprint arXiv:1911.11344, 2019.
[23].M. Wray, D. Larlus, G. Csurka, and D. Damen. Fine-grained action retrieval through multiple parts-of-speech embeddings[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 450-459.
[24].E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata. Generalized zero-and few-shot learning via aligned variational autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 8247-8255.
[25].P. Gupta, D. Sharma, and R. K. Sarvadevabhatla. Syntactically guided generative embeddings for zero-shot skeleton action recognition[C]//2021 IEEE International Conference on Image Processing. 2021: 439-443.
[26].Y. Zhou, W. Qiang, A. Rao, N. Lin, B. Su, and J. Wang. Zero-shot skeleton-based action recognition via mutual information estimation and maximization[C]//Proceedings of the 31st ACM International Conference on Multimedia. 2023: 5302-5310.
[27].S.-W. Li, Z.-X. Wei, W.-J. Chen, Y.-H. Yu, C.-Y. Yang, and J. Y. Hsu. Sa-dvae: Improving zero-shot skeleton-based action recognition by disentangled variational autoencoders[C] //European Conference on Computer Vision. 2024: 447-462.
[28].M.-Z. Li, Z. Jia, Z. Zhang, Z. Ma, and L. Wang. Multi-semantic Fusion Model For Generalized Zero-Shot Skeleton-Based Action Recognition[C]//International Conference on Image and Graphics. 2023: 68-80.
[29].Y. Chen, J. Guo, T. He, X. Lu, and L. Wang. Fine-grained side information guided dual-prompts for zero-shot skeleton action recognition[C]//Proceedings of the 32nd ACM International Conference on Multimedia. 2024: 778-786.
[30].A. Zhu, Q. Ke, M. Gong, and J. Bailey. Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 18761-18770.
|