| 1 | 邓淼磊, 高振东, 李磊, 等.  基于深度学习的人体行为识别综述. 计算机工程与应用, 2022, 58 (13): 14- 26. | 
																													
																						|  |  DENG M L ,  GAO Z D ,  LI L , et al.  Overview of human behavior recognition based on deep learning. Computer Engineering and Applications, 2022, 58 (13): 14- 26. | 
																													
																						| 2 | WANG H, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2013: 3551-3558. | 
																													
																						| 3 | KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2014: 1725-1732. | 
																													
																						| 4 | SIMONYAN K, ZISSERMAN A, SIMONYAN K, et al. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2014: 568-576. | 
																													
																						| 5 |  XIE Z ,  ZHOU Y ,  WU K W , et al.  Behavior recognition based on spatiotemporal attention LSTM. Journal of Computer Science, 2021, 44 (2): 261- 274. | 
																													
																						| 6 | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2015: 4489-4497. | 
																													
																						| 7 | QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 5534-5542. | 
																													
																						| 8 | DIBA A L, FAYYAZ M, SHARMA V, et al. Temporal 3D ConvNets: new architecture and transfer learning for video classification[EB/OL]. [2023-09-18]. https://arxiv.org/abs/1711.08200v1 . | 
																													
																						| 9 | HARA K, KATAOKA H, SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6546-6555. | 
																													
																						| 10 |  CHAUDHARI S ,  MITHAL V ,  POLATKAN G , et al.  An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology, 2021, 12 (5): 1- 32. | 
																													
																						| 11 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141. | 
																													
																						| 12 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19. | 
																													
																						| 13 | CAO Y, XU J R, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 1971-1980. | 
																													
																						| 14 | WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 11531-11539. | 
																													
																						| 15 |  | 
																													
																						| 16 |  | 
																													
																						| 17 | CARVALHO S R, BERTAGNOLLI N M, FOLKMAN T, et al. A temporal bottleneck attention architecture for video action recognition: WO2021US59372[P]. 2022-05-19. | 
																													
																						| 18 |  LI C H ,  ZHANG J ,  YAO J C .  Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning. Neurocomputing, 2021, 453, 383- 392.  doi: 10.1016/j.neucom.2020.07.148
 | 
																													
																						| 19 |  GONG J ,  LUO C ,  LUO Q .  Action recognition model based on attention mechanism and residual network. Electronic Measurement Technology, 2021, 44 (14): 111- 116. | 
																													
																						| 20 |  | 
																													
																						| 21 |  | 
																													
																						| 22 | ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6848-6856. | 
																													
																						| 23 |  LIN T Y ,  GOYAL P ,  GIRSHICK R , et al.  Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (2): 318- 327.  doi: 10.1109/TPAMI.2018.2858826
 | 
																													
																						| 24 |  ZHOU B ,  LI J F .  Human behavior recognition combined with object detection. Journal of Automation, 2020, 46 (9): 1961- 1970. | 
																													
																						| 25 |  WANG L M ,  XIONG Y J ,  WANG Z , et al.  Temporal segment networks for action recognition in videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (11): 2740- 2755. |