[1]汪威. 基于深度学习的自动扶梯视频人体动作识别[J].软件工程, 2021, 9: 24-27.
Wang Wei. Human Action Recognition in Escalator Videos Based on Deep Learning[J]. Software Engineering, 2021, 9: 24-27.
[2]刘国平, 王南星, 周毅, 等. 基于改进ReliefF算法的哑铃动作识别[J].科学技术与工程, 2019, 19(32):6.
Liu Guoping, Wang Nanxing, Zhou Yi, et al. Dumbbell Action Recognition Based on an Improved ReliefF Algorithm[J]. Science Technology and Engineering, 2019, 19(32): 6.
[3]席志红, 冯宇. 基于改进型c3d网络的人体行为识别算法[J]. 应用科技, 2021, 7: 47-53.
Xi Zhihong, Feng Yu. Human Activity Recognition Algorithm Based on an Improved C3D Network[J]. Applied Science and Technology, 2021, 7: 47-53.
[4]Moradi M, Blagec K, Haberl F, et al. GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain[EB/OL]. [2020-07-01]. https://doi.org/10.48550/arXiv.2109.02555
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter, Minnesota, Association for Computational Linguistics, 2018: 4171–4186.
[6]He K, Chen X, Xie S, et al. Masked Autoencoders Are Scalable Vision Learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2021: 16000-16009.
[7]Luo H, Ji L, Zhong M, et al. Clip4Clip: An empirical study of Clip for end to end video Clip retrieval and captioning[J]. Neurocomputing, 2022, 58: 293-304.
[8]Zolfaghari M, Singh K, Brox T. ECO: Efficient Convolutional Network for Online Video Understanding[J]. Springer, Cham, 2018, 11206: 713-730.
[9]Yin D, Hu L, Li B, et al. Adapter is All You Need for Tuning Visual Tasks[EB/OL]. [2023-11-25]. https://doi.org/ 10.48550/arXiv.2311.15010.html.
[10]Sung Y L, Cho J, Bansal M. VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks[C].//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 5217-5227.
[11]Chen S, Ge C, Tong Z, et al. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition[J]. Advances in Neural. Information Processing Systems 35, 2022: 16664-16678.
[12]Zaken, E.B., Ravfogel, S., Goldberg, Y. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models[C]//Proceedings of the 60th annual meeting of the association for computation for computational linguistics. Dublin, Association for Computational Linguistics, 2022, 2: 1-9.
[13]Hu E J, Shen Y, Wallis P, et al. LoRA: Low-Rank Adaptation of Large Language Models[EB/OL]. [2021-10-16]. https://doi.org/10.48550/arXiv.2106.09685.
[14]Pan J, Lin Z, Zhu X, et al. ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning[J]. Advances in Neural Information Processing Systems 35. 2022: 26462-26477.
[15]DosoVitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[EB/OL]. [2020-10-22]. https://doi.org/10.48550/ arXiv.2010.11929.html.
[16]Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceeding of the IEEE Computer Vision & Pattern Recognition. Washington D.C., USA: IEEE Press, 2009: 248-255.
[17] Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models From Natural Language Supervision[C]//In International Conference on Machine Learning. New York, USA: ACM Press, 2021: 8748-8763.
[18] Xu H, Ghosh G, Huang P Y, et al. VideoClip: Contrastive Pre-training for Zero-shot Video-Text Understanding[EB/OL]. [2021-9-28]. https://doi.org/ 10.48550/arXiv.2109.14084.html.
[19] Bhattacharjee A, Moitra A, Panda P. ClipFormer: Key-Value Clipping of Transformers on Memristive Crossbars for Write Noise Mitigation[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 4: 592-601.
[20] SZEGEDY, Christian, et al. Going deeper with convolutions[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. Washington D.C., USA: IEEE Press, 2015: 1-9.
[21] KRIZHEVSKY, Alex; SUTSKEVER, Ilya; HINTON, Geoffrey E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25: 84-90.
[22] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[EB/OL].[2017-7-12] https://doi.org/10.48550/ arXiv.1706.03762.html.
[23] Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action Video Dataset[EB/OL]. [2017-5-19] https://doi.org/ 10.48550/ arXiv.1705.06950.html.
[24] Soomro K, Zamir A R, Shah M. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild[J]. Computer Science, 2012. https://doi.org/10.48550/arXiv. 1212.0402.html.
[25] Kuehne H, Jhuang H, Garrote E, et al. HMDB: A Large Video Database for Human Motion Recognition[C] //Proceedings of the International Conference on Computer Vision. Washington D. C., USA Pressing, 2011: 2556-2563.
[26] D. T, H. W, M. F, et al. Video Classification With Channel-Separated Convolutional Networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2019: 5552-5561.
[27] Feichtenhofer C. X3D: Expanding Architectures for Efficient Video Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2020: 203-213.
[28] Neimark D, Bar O, Zohar M, et al. Video Transformer Network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2021: 3163-3172.
[29] K. Z, M. L, X. G, et al. Temporal Shift Module-Based Vision Transformer Network for Action Recognition[J]. IEEE Access, 2024, 12: 47246-47257.
[30] Zhang H, Hao Y, Ngo C W. Token Shift Transformer for Video Classification[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, Association for Computing Machinery, 2021: 917-925.
[31] Zhang H, Cheng L, Hao Y, et al. Long-term Leap Attention, Short-term Periodic Shift for Video Classification[C]//Proceedings of the 30th Acm International Conference on Multimedia. New York, Association for Computing Machinery, 2022: 5773-5782.
[32] Hao, Yanbin, et al. Group Contextualization for Video Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2022: 928-938.
[33] Chen, Yatong, et al. AGPN: Action Granularity Pyramid Network for Video Action Recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(8): 3912-3923.
[34] Ju C, Han T, Zheng K, et al. Prompting Visual-Language Models for Efficient Video Understanding[C].//In: European Conference on Computer Vision. Cham: Springer Nature Switzerland ,2021: 105-124.
[35] H. Ga SHENG X X, LI K C, SHEN Z Q, et al. A progressive difference method for capturing visual tempos on action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 977-987.
[36] Rasheed H, Khattak M U, Maaz M, et al. Fine-tuned clip models are efficient video learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Washington D. C., USA Pressing,. 2023: 6545-6554.
[37] Q. Wang, Q. Hu, Z. Gao et al. AMS-Net: Modeling adaptive multi-granularity spatio-temporal cues for video action recognition[EB/OL] [2023-10-13], doi: 10.119/TNNLS.2023.3321141.
[38] M. Wang, J. Xing, J. Mei, Y. Liu and Y. Jiang, ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(1): 625-637 doi: 10.1109/TNNLS.2023.3331841.
[39] H Gao, -S. Xie, R. Yan, Q. Cui et al. Hierarchical Motion-Enhanced Matching Framework for Few-Shot Action Recognition[J] IEEE Transactions on Multimedia, 2025,27: 2450-2462.
[40] Z. Li and X. Ping,et al. Open-Vocabulary Action Recognition with Masked Visual Prompt and Verb Semantic Reconstruction[C]//2025 6th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 2025: 01-06, doi: 10.1109/ICCEA65460.2025.11103216.
[41] Wang L, Tong Z, Ji B, et al. TDN: Temporal Difference Networks for Efficient Action Recognition: Computer Vision and Pattern Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2021: 1895-1904.
[42] Li Y, Ji B, Shi X, et al. TEA: Temporal Excitation and Aggregation for Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2020: 909-918 .
[43] Jiang B, Yan J, Wang M, et al. STM: Spatio-Temporal and Motion Encoding for Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA Pressing, 2019: 2000-2009.
[44] Zhu L, Tran D, Sevilla-Lara L, et al. FASTER Recurrent Networks for Efficient Video Classification[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, AAAI, 2020, 34(7): 13098-13105.
[45] Duan H, Zhao Y, Xiong Y, et al. Omni-sourced Webly-supervised Learning for Video Recognition[C]//In: European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2020: 670-688.
[46] 王晓路, 汶建荣. 基于运动-时间感知的人体动作识别方法[J]. 计算机工程, 2025,51(1): 216-224.
WANG Xiaolu, WEN Jianrong. Human Action Recognition Method Based on Action-Time Perception[J]. Computer Engineering, 2025, 51(1): 216-224.
[47] 龚安, 赵宗泽, 张贵临. 多模态交叉注意力融合的视频动作识别[J].信息技术, 2025,(06):70-75+80. DOI:10.13274/j.cnki.hdzj.2025.06.012.
Gong A., Zhao Z., Zhang G. (2025). Multimodal Cross-Attention Fusion for Video Action Recognition. Information Technology, (06), 70–75+80. DOI: 10.13274/j.cnki.hdzj.2025.06.012.
[48] 孙凯铭. 基于注意力机制的时空融合动作识别方法[D].大连交通大学, 2024. DOI:10.26990/d.cnki.gsltc.2024.000356.
Sun K. Spatiotemporal Fusion Action Recognition Method Based on Attention Mechanism[D]. Dalian Jiaotong University, 2024. DOI: 10.26990/d.cnki.gsltc.2024.000356.
[49] Y. He, Y. Yang, C. Li, J. Huanget al. Video Human Action Recognition and Classification Based on Channel Attention and LSTM Transformer[C]//2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 2025: 552-558. doi: 10.1109/ICAIBD64986.2025.11082043.
[50] 王小伟,沈燕飞,邢庆君.参数高效化微调的双分支视频动作识别方法[J].河南理工大学学报(自然科学版), 2025, 44(04):21-28.DOI:10.16186/j.cnki.1673-9787.2025020018.
Wang X, Shen Y, Xing Q. A Dual-Branch Video Action Recognition Method Based on Parameter-Efficient Fine-Tuning[J]. Journal of Henan Polytechnic University (Natural Science), 2025, 44(04): 21–28. DOI: 10.16186/j.cnki.1673-9787.2025020018.
|