Student Classroom Behavior Detection Algorithm Based on Improved YOLOv8 in Smart Education

doi:10.19678/j.issn.1000-3428.0069597

Abstract

Abstract:

To accelerate the digital transformation of education, the precise analysis and empirical application of AI technology integrated into the entire process of teaching and learning behaviors have become a current research hotspot. To address the problems of low detection accuracy, high density of bonding boxes, severe overlap and occlusion, large scale variations, and imbalance of data volume in student classroom behavior detection, this paper establishes a student classroom behavior dataset (DBS Dataset). Additionally, it proposes a student classroom behavior detection algorithm VWE-YOLOv8 based on improved YOLOv8. First, it introduces the CSWin-Transformer attention mechanism to enhance the model's capability to extract global information from images. This improves the network's detection accuracy. Second, it increases the model's recognition capability on multi-scale targets by integrating the Large Separable Kernel Attention (LSKA) module into the SPPF architecture. Additionally, it incorporates an occlusion-aware attention mechanism into the design of the detection head (which modifies the original Head structure to SEAMHead) to effectively detect occluded objects. Finally, it introduces a weight adjustment function (Slide Loss) to address the issue of sample imbalance. The experimental results reveal that compared with YOLOv8, the improved VWE-YOLOv8 achieves increases of 1.16% and 1.70% in mAP@0.50 and 7.36% and 2.13% in mAP@0.50∶0.95, on the DBS Dataset and public SCB Dataset. Furthermore, it improves the precision by 4.17%, 6.74% and recall rate by 1.96% and 3.13% on these datasets, respectively. These results indicate that the improved algorithm has a higher detection accuracy and stronger generalization capability. Moreover, it is capable of detecting students' classroom behaviors. This can strongly support the application of smart education and aid the digital transformation of education.

Key words: smart education, student behavior detection, object detection, attention mechanism, Large Separable Kernel Attention(LSKA) module

摘要：

为了加快教育的数字化转型, 人工智能技术融入教与学全过程行为的精准分析与实证应用已成为当前的研究热点。针对目前学生课堂行为检测中存在的检测精度低、目标框密度高、重叠遮挡严重、尺度变化大以及数据量不平衡等问题, 创建学生课堂行为数据集DBS Dataset, 并提出一种基于改进YOLOv8的学生课堂行为检测算法VWE-YOLOv8。首先引入注意力机制CSWin-Transformer, 增强模型对图像全局信息的提取能力, 提高网络的检测精度; 然后集成大可分离核心注意力(LSKA) 模块到SPPF架构中, 增加模型在多尺度目标上的识别能力; 接着将遮挡感知注意力机制融入到检测头的设计中, 将原有的Head结构修改为SEAMHead, 实现模型对遮挡物体的有效检测; 最后引入权重调整函数Slide Loss来处理样本不均衡问题。实验结果表明, 与YOLOv8相比, 在DBS Dataset和公开数据集SCB Dataset上, 改进后VWE-YOLOv8的mAP@0.50分别提高了1.16%、1.70%, mAP@0.50∶0.95分别提高了7.36%、2.13%, 精度分别提升了4.17%、6.74%, 召回率分别提升了1.96%、3.13%, 说明该算法具有更高的检测精度和较强的泛化能力, 能够胜任学生课堂行为的检测任务, 有力支撑智慧教育应用, 助力教育数字化转型。

关键词: 智慧教育, 学生行为检测, 目标检测, 注意力机制, 大可分离核心注意力模块

ZENG Yuqi, LIU Bo, ZHONG Baichang, ZHONG Jin. Student Classroom Behavior Detection Algorithm Based on Improved YOLOv8 in Smart Education[J]. Computer Engineering, 2024, 50(9): 344-355.

曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069597

https://www.ecice06.com/EN/Y2024/V50/I9/344

Figures/Tables 18

Fig.1 The architecture of the object detection network YOLOv8

Fig.2 The structure of VWE-YOLOv8 algorithm for student classroom behavior detection

Fig.3 Cross stripe window-transformer feature extraction network

Fig.4 Cross stripe window-transformer block structure

Fig.5 Cross-shaped window self-attention mechanism

Fig.6 SPPF_LSKA structure

Fig.7 LSKA structure

Fig.8 Separated and enhancement attention module structure

Fig.9 Comparisons of the model before and after improvement in hands up and turning movements

Fig.10 Comparisons of the model before and after improvement in hands up and listen

Fig.11 Comparisons before and after model improvement for multi-scale issues

Fig.12 Performance of the VWE-YOLOv8 model in occlusion overlap issues

Fig.13 Performance of the VWE-YOLOv8 model in multi-target recognition

References 25

1	胡钦太, 伍文燕, 冯广, 等. 深度学习支持下多模态学习行为可解释性分析研究. 电化教育研究, 2021, 42 (11): 77- 83. URL
	HU Q T, WU W Y, FENG G, et al. A study on interpretable analysis of multimodal learning behavior supported by deep learning learning. e-Education Research, 2021, 42 (11): 77- 83. URL
2	中共中央国务院. 深化新时代教育评价改革总体方案[EB/OL]. [2024-02-03]. http://www.gov.cn/zhengce/2020-10/13/content_5551032.html.
	CPC Central Committee and State Council. Overall programme for deepening the reform of education evaluation in the new era[EB/OL]. [2024-02-03]. http://www.gov.cn/zhengce/2020-10/13/content_5551032.html. (in Chinese)
3	刘清堂, 李小娟, 谢魁, 等. 多模态学习分析实证研究的发展与展望. 电化教育研究, 2022, 43 (1): 71-78, 85. URL
	LIU Q T, LI X J, XIE K, et al. Developments and prospects of empirical research on multimodal learning analysis. e-Education research, 2022, 43 (1): 71-78, 85. URL
4	尹宏鹏, 陈波, 柴毅, 等. 基于视觉的目标检测与跟踪综述. 自动化学报, 2016, 42 (10): 1466- 1489. URL
	YIN H P, CHEN B, CHAI Y, et al. Vision-based object detection and tracking: a review. Acta Automatica Sinica, 2016, 42 (10): 1466- 1489. URL
5	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788.
6	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2024-02-03]. https://arxiv.org/pdf/1804.02767.
7	BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2024-02-03]. https://arxiv.org/pdf/2004.10934.
8	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]. Berlin, Germany: Springer, 2016.
9	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2024-02-03]. https://arxiv.org/abs/1701.06659.
10	AGRAWAL P, GIRSHICK R, MALIK J. Analyzing the performance of multilayer neural networks for object recognition[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 329-344.
11	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
12	王泽杰, 沈超敏, 赵春, 等. 融合人体姿态估计和目标检测的学生课堂行为识别. 华东师范大学学报(自然科学版), 2022, (2): 55- 66. URL
	WANG Z J, SHEN C M, ZHAO C, et al. Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection. Journal of East China Normal University (Natural Science), 2022, (2): 55- 66. URL
13	谭暑秋, 汤国放, 涂媛雅, 等. 教室监控下学生异常行为检测系统. 计算机工程与应用, 2022, 58 (7): 176- 184. URL
	TAN S Q, TANG G F, TU Y Y, et al. Classroom monitoring students abnormal behavior detection system. Computer Engineering and Applications, 2022, 58 (7): 176- 184. URL
14	ZHANG Y W, WU Z, CHEN X J, et al. Classroom behavior recognition based on improved yolov3[C]//Proceedings of International Conference on Artificial Intelligence and Education. Washington D. C., USA: IEEE Press, 2020: 93-97.
15	LI L N, LIU M H, SUN L Y, et al. ET-YOLOv5s: toward deep identification of students' in-class behaviors. IEEE Access, 2022, 10, 44200- 44211. doi: 10.1109/ACCESS.2022.3169586
16	CHEN H W, ZHOU G H, JIANG H X. Student behavior detection in the classroom based on improved YOLOv8. Sensors, 2023, 23 (20): 8385. doi: 10.3390/s23208385
17	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 7464-7475.
18	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2117-2125.
19	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
20	GE Z, LIU S, WANG F, et al. Yolox: exceeding yolo series in 2021[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2107.08430.
21	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2010.11929.
22	DONG X Y, BAO J M, CHEN D D, et al. CSWin Transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 12124-12134.
23	LAU K W, PO L M, REHMAN Y A U. Large separable kernel attention: rethinking the large kernel attention design in CNN. Expert Systems with Applications, 2024, 236, 121352. doi: 10.1016/j.eswa.2023.121352
24	YU Z, HUANG H, CHEN W, et al. Yolo-facev2: a scale and occlusion aware face detector[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2208.02019.
25	FAN Y. SCB-dataset: a dataset for detecting student classroom behavior[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2304.02488

[1]	LI Junjun, DONG Jiangang, LI Kun. Research on Kubernetes-based Cluster Energy-Saving Strategy [J]. Computer Engineering, 2024, 50(9): 82-91.
[2]	LIN Chang, GUO Wei, REN Zhecong, JIN Haibo. Unification Algorithm for Object Tracking and Segmentation Based on Transformer [J]. Computer Engineering, 2024, 50(9): 130-141.
[3]	LI Zelin, LÜ Zhaofeng, CHEN Fuqiang, LI Ke. Entity Alignment Model Based on Multi-Hop Information Fusion [J]. Computer Engineering, 2024, 50(9): 142-152.
[4]	WANG Ruying, MA Jiajun, DONG Jianqiang, LIU Wanlong, ZHANG Haitao, YIN Kai, ZHAO Bochao. Industrial Load Forecasting Method Based on MTS-BiGRU-DMHSA [J]. Computer Engineering, 2024, 50(9): 169-178.
[5]	ZHU Kai, LI Li, ZHANG Tong, JIANG Sheng, BIE Yiming. Multi-Stage Motion Blur Image Restoration Network Based on Transformer [J]. Computer Engineering, 2024, 50(9): 276-285.
[6]	ZHANG Tianpeng, HAN Jing, LÜ Xueqiang. Super-Resolution-Aided Small-Target Detection Based on Multi-Task Learning [J]. Computer Engineering, 2024, 50(9): 304-312.
[7]	GUO Min, ZHANG Xihan, LI Yang. Integrated Attentional Teacher Mutual Consistency Semi-Supervised Medical Image Segmentation [J]. Computer Engineering, 2024, 50(9): 313-323.
[8]	Suzhe WANG, Xueying ZHANG, Xiaoyu CHEN, Fenglian LI, Zeling WU. EEG Enhancement Algorithm Based on Combination of Effective Attention and GAN [J]. Computer Engineering, 2024, 50(8): 336-344.
[9]	Yu WANG, Qi QI, Chun WANG, Cai XU. High-Precision Fault Diagnosis Method for Energy Storage Inverter Signals [J]. Computer Engineering, 2024, 50(8): 389-396.
[10]	Rixin RAO, Yiwen WANG, Lizhi ZENG, Xintian TONG, Haitao ZHAO. Lightweight Network Model for Waste Cable Detection [J]. Computer Engineering, 2024, 50(8): 22-30.
[11]	Huayu LI, Zhikang ZHANG, Yang YAN, Yang YUE. Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph [J]. Computer Engineering, 2024, 50(8): 31-39.
[12]	Lei WANG, Shipeng DANG, Feng PAN. Model for Predicting Concealed Accessory Pathway Based on Convolutional Neural Network [J]. Computer Engineering, 2024, 50(8): 40-49.
[13]	Han CHEN, Chunlei ZHAO, Haoda JIANG, Chundong WANG. Research on App User Intent Recognition Based on Fusion Model and Semantic Network [J]. Computer Engineering, 2024, 50(8): 50-63.
[14]	Ci XIAO, Yang XU, Yongdan ZHANG, Mingwen FENG, Yiqian HUANG. Nighttime Semantic Segmentation with Attention and Low-Light Enhancement [J]. Computer Engineering, 2024, 50(7): 271-281.
[15]	Xiying ZHANG, Shoudong SUN, Haihao YU, Jilong BIAN. Spatial Propagation-based Multi-View 3D Reconstruction [J]. Computer Engineering, 2024, 50(7): 293-302.

Please choose a citation manager

Content to export