作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (7): 42-52. doi: 10.19678/j.issn.1000-3428.0069593

• 智慧教育 • 上一篇    下一篇

基于大语言模型的个性化实验报告评语自动生成与应用

翟洁, 李艳豪*(), 李彬彬, 郭卫斌   

  1. 华东理工大学信息科学与工程学院, 上海 200237
  • 收稿日期:2024-03-18 出版日期:2024-07-15 发布日期:2024-07-05
  • 通讯作者: 李艳豪
  • 基金资助:
    上海高校市级重点课程建设项目(沪教委高〔2022〕27号); 上海市教育委员会课题项目; 教育部-华为"智能基座"产教融合协同育人基地一流课程项目

Personalized Experiment Report Comments Auto-Generation and Application Based on Large Language Models

Jie ZHAI, Yanhao LI*(), Binbin LI, Weibin GUO   

  1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2024-03-18 Online:2024-07-15 Published:2024-07-05
  • Contact: Yanhao LI

摘要:

在计算机实验报告评阅过程中, 不同的实验报告评价体系呈现出多样性和差异性, 固化的实验评语模板缺乏个性化的内容, 评价结果往往未给出可解释性的依据。针对以上问题, 提出基于大语言模型的个性化实验报告评语自动生成框架。通过主题-评估决策-集成提示策略, 从教师的实验需求、代码质量需求中抽取该实验特有的评价体系, 形成评估决策树, 构建计算机软件方向课程共享的评估决策树库。设计基于大语言模型和决策树的实验要求、代码质量主题评级方法, 从评估决策树库检索匹配学生实验报告内容的评估决策树, 结合实验报告和代码文本, 自动生成实验主题、代码质量主题定量或定性的评级结果及对应的可解释性依据。在实验报告模板中融入学生已完成的实验任务、主题评级结果、评价依据等, 生成个性化的实验评语。实验结果表明, 基于主题-评估决策-集成提示策略的决策树生成结果明显优于未用提示的方法, 该策略各部分具有一定的有效性和合理性, 同时自动生成的评级结果和教师原先批阅的评阅结果对比, 软件测试、面向对象程序设计、电商金融课程示例匹配正确率均达到90%以上。从任课教师对于自动生成的评语评分分析, 评语在流畅性、相关性、合理性3个维度上达到了较高的质量水平。

关键词: 大语言模型, 实验评估决策树, 个性化, 评语自动生成, 代码质量评价

Abstract:

While reviewing computer experiment reports, assessment systems exhibit diversity and discrepancies. The rigid templates used for evaluation lack personalized content, and the results often fail to provide a basis for interpretability. To address these issues, this study proposes a personalized experiment report comments auto-generation framework based on large language models. The study employs a Theme-Evaluation Decisions-Integrated (T-ED-I) hint strategy to extract a unique evaluation system based on teachers' requirements regarding experiment and code quality. This strategy ultimately builds a shared library of assessment decision trees for computer software courses. It introduces a method for grading experiments and code-quality themes based on large language models and decision trees. By retrieving an evaluation decision tree from the library that matches a student's experiment report and integrating the report and code text, the proposed method auto-generation quantitative or qualitative grading results for the experiment and code quality, along with corresponding interpretative justifications. Finally, personalized evaluation comments are generated by integrating the students' completed experimental tasks, theme grading results, and evaluation bases into a experiment report template. The experimental results show that the decision trees generated using the T-ED-I hint strategy significantly outperform those generated from strategies without hints. Ablation studies confirm the effectiveness and rationality of each component of this strategy. Additionally, when comparing the auto-generation grading results with the original teacher evaluations, the match rate for software engineering, programming, and interdisciplinary courses exceeds 90%. Moreover, teachers' ratings on the auto-generation comments in terms of fluency, relevance, and rationality indicate a high level of quality across these dimensions.

Key words: large language models, decision trees of experimental evaluation, personalization, comments auto-generation, code quality assessment