| 1 | 胡钦太, 伍文燕, 冯广, 等. 深度学习支持下多模态学习行为可解释性分析研究. 电化教育研究, 2021, 42 (11): 77- 83.  URL
 | 
																													
																							|  | HU Q T, WU W Y, FENG G, et al. A study on interpretable analysis of multimodal learning behavior supported by deep learning learning. e-Education Research, 2021, 42 (11): 77- 83.  URL
 | 
																													
																							| 2 |  | 
																													
																							|  |  | 
																													
																							| 3 | 刘清堂, 李小娟, 谢魁, 等. 多模态学习分析实证研究的发展与展望. 电化教育研究, 2022, 43 (1): 71-78, 85.  URL
 | 
																													
																							|  | LIU Q T, LI X J, XIE K, et al. Developments and prospects of empirical research on multimodal learning analysis. e-Education research, 2022, 43 (1): 71-78, 85.  URL
 | 
																													
																							| 4 | 尹宏鹏, 陈波, 柴毅, 等. 基于视觉的目标检测与跟踪综述. 自动化学报, 2016, 42 (10): 1466- 1489.  URL
 | 
																													
																							|  | YIN H P, CHEN B, CHAI Y, et al. Vision-based object detection and tracking: a review. Acta Automatica Sinica, 2016, 42 (10): 1466- 1489.  URL
 | 
																													
																							| 5 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788. | 
																													
																							| 6 |  | 
																													
																							| 7 |  | 
																													
																							| 8 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]. Berlin, Germany: Springer, 2016. | 
																													
																							| 9 |  | 
																													
																							| 10 | AGRAWAL P, GIRSHICK R, MALIK J. Analyzing the performance of multilayer neural networks for object recognition[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 329-344. | 
																													
																							| 11 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149.  doi: 10.1109/TPAMI.2016.2577031
 | 
																													
																							| 12 | 王泽杰, 沈超敏, 赵春, 等. 融合人体姿态估计和目标检测的学生课堂行为识别. 华东师范大学学报(自然科学版), 2022, (2): 55- 66.  URL
 | 
																													
																							|  | WANG Z J, SHEN C M, ZHAO C, et al. Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection. Journal of East China Normal University (Natural Science), 2022, (2): 55- 66.  URL
 | 
																													
																							| 13 | 谭暑秋, 汤国放, 涂媛雅, 等. 教室监控下学生异常行为检测系统. 计算机工程与应用, 2022, 58 (7): 176- 184.  URL
 | 
																													
																							|  | TAN S Q, TANG G F, TU Y Y, et al. Classroom monitoring students abnormal behavior detection system. Computer Engineering and Applications, 2022, 58 (7): 176- 184.  URL
 | 
																													
																							| 14 | ZHANG Y W, WU Z, CHEN X J, et al. Classroom behavior recognition based on improved yolov3[C]//Proceedings of International Conference on Artificial Intelligence and Education. Washington D. C., USA: IEEE Press, 2020: 93-97. | 
																													
																							| 15 | LI L N, LIU M H, SUN L Y, et al. ET-YOLOv5s: toward deep identification of students' in-class behaviors. IEEE Access, 2022, 10, 44200- 44211.  doi: 10.1109/ACCESS.2022.3169586
 | 
																													
																							| 16 | CHEN H W, ZHOU G H, JIANG H X. Student behavior detection in the classroom based on improved YOLOv8. Sensors, 2023, 23 (20): 8385.  doi: 10.3390/s23208385
 | 
																													
																							| 17 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 7464-7475. | 
																													
																							| 18 | LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2117-2125. | 
																													
																							| 19 | LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768. | 
																													
																							| 20 |  | 
																													
																							| 21 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2010.11929 . | 
																													
																							| 22 | DONG X Y, BAO J M, CHEN D D, et al. CSWin Transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 12124-12134. | 
																													
																							| 23 | LAU K W, PO L M, REHMAN Y A U. Large separable kernel attention: rethinking the large kernel attention design in CNN. Expert Systems with Applications, 2024, 236, 121352.  doi: 10.1016/j.eswa.2023.121352
 | 
																													
																							| 24 |  | 
																													
																							| 25 |  |