智慧教育下基于改进YOLOv8的学生课堂行为检测算法

doi:10.19678/j.issn.1000-3428.0069597

摘要/Abstract

摘要：

为了加快教育的数字化转型, 人工智能技术融入教与学全过程行为的精准分析与实证应用已成为当前的研究热点。针对目前学生课堂行为检测中存在的检测精度低、目标框密度高、重叠遮挡严重、尺度变化大以及数据量不平衡等问题, 创建学生课堂行为数据集DBS Dataset, 并提出一种基于改进YOLOv8的学生课堂行为检测算法VWE-YOLOv8。首先引入注意力机制CSWin-Transformer, 增强模型对图像全局信息的提取能力, 提高网络的检测精度; 然后集成大可分离核心注意力(LSKA) 模块到SPPF架构中, 增加模型在多尺度目标上的识别能力; 接着将遮挡感知注意力机制融入到检测头的设计中, 将原有的Head结构修改为SEAMHead, 实现模型对遮挡物体的有效检测; 最后引入权重调整函数Slide Loss来处理样本不均衡问题。实验结果表明, 与YOLOv8相比, 在DBS Dataset和公开数据集SCB Dataset上, 改进后VWE-YOLOv8的mAP@0.50分别提高了1.16%、1.70%, mAP@0.50∶0.95分别提高了7.36%、2.13%, 精度分别提升了4.17%、6.74%, 召回率分别提升了1.96%、3.13%, 说明该算法具有更高的检测精度和较强的泛化能力, 能够胜任学生课堂行为的检测任务, 有力支撑智慧教育应用, 助力教育数字化转型。

关键词: 智慧教育, 学生行为检测, 目标检测, 注意力机制, 大可分离核心注意力模块

Abstract:

To accelerate the digital transformation of education, the precise analysis and empirical application of AI technology integrated into the entire process of teaching and learning behaviors have become a current research hotspot. To address the problems of low detection accuracy, high density of bonding boxes, severe overlap and occlusion, large scale variations, and imbalance of data volume in student classroom behavior detection, this paper establishes a student classroom behavior dataset (DBS Dataset). Additionally, it proposes a student classroom behavior detection algorithm VWE-YOLOv8 based on improved YOLOv8. First, it introduces the CSWin-Transformer attention mechanism to enhance the model's capability to extract global information from images. This improves the network's detection accuracy. Second, it increases the model's recognition capability on multi-scale targets by integrating the Large Separable Kernel Attention (LSKA) module into the SPPF architecture. Additionally, it incorporates an occlusion-aware attention mechanism into the design of the detection head (which modifies the original Head structure to SEAMHead) to effectively detect occluded objects. Finally, it introduces a weight adjustment function (Slide Loss) to address the issue of sample imbalance. The experimental results reveal that compared with YOLOv8, the improved VWE-YOLOv8 achieves increases of 1.16% and 1.70% in mAP@0.50 and 7.36% and 2.13% in mAP@0.50∶0.95, on the DBS Dataset and public SCB Dataset. Furthermore, it improves the precision by 4.17%, 6.74% and recall rate by 1.96% and 3.13% on these datasets, respectively. These results indicate that the improved algorithm has a higher detection accuracy and stronger generalization capability. Moreover, it is capable of detecting students' classroom behaviors. This can strongly support the application of smart education and aid the digital transformation of education.

Key words: smart education, student behavior detection, object detection, attention mechanism, Large Separable Kernel Attention(LSKA) module

曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.

ZENG Yuqi, LIU Bo, ZHONG Baichang, ZHONG Jin. Student Classroom Behavior Detection Algorithm Based on Improved YOLOv8 in Smart Education[J]. Computer Engineering, 2024, 50(9): 344-355.

https://www.ecice06.com/CN/Y2024/V50/I9/344

图/表 18

图1 目标检测网络YOLOv8结构

Fig.1 The architecture of the object detection network YOLOv8

图2 学生课堂行为检测的VWE-YOLOv8算法结构

Fig.2 The structure of VWE-YOLOv8 algorithm for student classroom behavior detection

图3 交叉条形窗口变换器特征提取网络

Fig.3 Cross stripe window-transformer feature extraction network

图4 交叉条形窗口变压器模块结构

Fig.4 Cross stripe window-transformer block structure

图5 十字形窗口自注意力机制

Fig.5 Cross-shaped window self-attention mechanism

图6 SPPF_LSKA结构

Fig.6 SPPF_LSKA structure

图7 LSKA结构

Fig.7 LSKA structure

图8 分离与增强注意力模块结构

Fig.8 Separated and enhancement attention module structure

图9 在举手与转身动作中模型改进前后的对比

Fig.9 Comparisons of the model before and after improvement in hands up and turning movements

图10 在举手与听讲中模型改进前后的对比

Fig.10 Comparisons of the model before and after improvement in hands up and listen

图11 多尺度问题中模型改进前后的对比

Fig.11 Comparisons before and after model improvement for multi-scale issues

图12 VWE-YOLOv8模型在遮挡重叠问题中的表现

Fig.12 Performance of the VWE-YOLOv8 model in occlusion overlap issues

图13 VWE-YOLOv8模型在多目标识别中的表现

Fig.13 Performance of the VWE-YOLOv8 model in multi-target recognition

参考文献 25

1	胡钦太, 伍文燕, 冯广, 等. 深度学习支持下多模态学习行为可解释性分析研究. 电化教育研究, 2021, 42 (11): 77- 83. URL
	HU Q T, WU W Y, FENG G, et al. A study on interpretable analysis of multimodal learning behavior supported by deep learning learning. e-Education Research, 2021, 42 (11): 77- 83. URL
2	中共中央国务院. 深化新时代教育评价改革总体方案[EB/OL]. [2024-02-03]. http://www.gov.cn/zhengce/2020-10/13/content_5551032.html.
	CPC Central Committee and State Council. Overall programme for deepening the reform of education evaluation in the new era[EB/OL]. [2024-02-03]. http://www.gov.cn/zhengce/2020-10/13/content_5551032.html. (in Chinese)
3	刘清堂, 李小娟, 谢魁, 等. 多模态学习分析实证研究的发展与展望. 电化教育研究, 2022, 43 (1): 71-78, 85. URL
	LIU Q T, LI X J, XIE K, et al. Developments and prospects of empirical research on multimodal learning analysis. e-Education research, 2022, 43 (1): 71-78, 85. URL
4	尹宏鹏, 陈波, 柴毅, 等. 基于视觉的目标检测与跟踪综述. 自动化学报, 2016, 42 (10): 1466- 1489. URL
	YIN H P, CHEN B, CHAI Y, et al. Vision-based object detection and tracking: a review. Acta Automatica Sinica, 2016, 42 (10): 1466- 1489. URL
5	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788.
6	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2024-02-03]. https://arxiv.org/pdf/1804.02767.
7	BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2024-02-03]. https://arxiv.org/pdf/2004.10934.
8	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]. Berlin, Germany: Springer, 2016.
9	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2024-02-03]. https://arxiv.org/abs/1701.06659.
10	AGRAWAL P, GIRSHICK R, MALIK J. Analyzing the performance of multilayer neural networks for object recognition[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 329-344.
11	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
12	王泽杰, 沈超敏, 赵春, 等. 融合人体姿态估计和目标检测的学生课堂行为识别. 华东师范大学学报(自然科学版), 2022, (2): 55- 66. URL
	WANG Z J, SHEN C M, ZHAO C, et al. Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection. Journal of East China Normal University (Natural Science), 2022, (2): 55- 66. URL
13	谭暑秋, 汤国放, 涂媛雅, 等. 教室监控下学生异常行为检测系统. 计算机工程与应用, 2022, 58 (7): 176- 184. URL
	TAN S Q, TANG G F, TU Y Y, et al. Classroom monitoring students abnormal behavior detection system. Computer Engineering and Applications, 2022, 58 (7): 176- 184. URL
14	ZHANG Y W, WU Z, CHEN X J, et al. Classroom behavior recognition based on improved yolov3[C]//Proceedings of International Conference on Artificial Intelligence and Education. Washington D. C., USA: IEEE Press, 2020: 93-97.
15	LI L N, LIU M H, SUN L Y, et al. ET-YOLOv5s: toward deep identification of students' in-class behaviors. IEEE Access, 2022, 10, 44200- 44211. doi: 10.1109/ACCESS.2022.3169586
16	CHEN H W, ZHOU G H, JIANG H X. Student behavior detection in the classroom based on improved YOLOv8. Sensors, 2023, 23 (20): 8385. doi: 10.3390/s23208385
17	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2023: 7464-7475.
18	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2117-2125.
19	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
20	GE Z, LIU S, WANG F, et al. Yolox: exceeding yolo series in 2021[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2107.08430.
21	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2010.11929.
22	DONG X Y, BAO J M, CHEN D D, et al. CSWin Transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 12124-12134.
23	LAU K W, PO L M, REHMAN Y A U. Large separable kernel attention: rethinking the large kernel attention design in CNN. Expert Systems with Applications, 2024, 236, 121352. doi: 10.1016/j.eswa.2023.121352
24	YU Z, HUANG H, CHEN W, et al. Yolo-facev2: a scale and occlusion aware face detector[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2208.02019.
25	FAN Y. SCB-dataset: a dataset for detecting student classroom behavior[EB/OL]. [2024-02-03]. https://arxiv.org/abs/2304.02488

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[9]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[10]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[11]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[12]	王昱婷, 刘志明, 万亚平, 朱涛. 基于可见光与红外图像的弱光条件下目标检测[J]. 计算机工程, 2024, 50(8): 270-281.
[13]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[14]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.
[15]	王炼红, 林飞鹏, 李潇瑶, 谌桂枝, 周莉. 融入课程知识图谱的KMAKT预测[J]. 计算机工程, 2024, 50(7): 23-31.

选择文件类型/文献管理软件名称

选择包含的内容