YANG Xingrui, MA Bin, LI Senyao, ZHONG Xian
Accepted: 2024-04-19
In the field of natural language processing, large language models are currently witnessing vigorous development. However, in the process of application in educational digitization, a series of important challenges still exist. Aiming at addressing the problem posed by the scarcity of domain-specific data, unstable summarization leading to information loss or redundancy, a lightweight idempotent model framework, IGLM, is introduced for educational text summarization. The model first employs multi-source training for adaptive augmentation to enhance data diversity. Subsequently, various fine-tuning procedures are applied to the downstream text summarization task. Concurrently, an idempotent summarization generation strategy is designed to mitigate the impact of text length, the initial summaries are brought closer to idempotent summaries to constrain the model and mitigate biases resulting from uneven language corpora and combining quantization techniques to generate more precise and fluent summary texts under low-resource conditions. The experiments use ROUGE F1 scores as the evaluation metric and validate on the publicly available Chinese text summarization datasets, LCSTS, EDUCATION and NLPCC. The results of experiments reveal significant enhancements in precision and coherence within this framework. Specifically, in comparison to the baseline model, the ROUGE-1/2/L scores experienced respective increases of 7.9, 7.4, and 8.7 on the LCSTS dataset. Moreover, on the EDUCATION dataset, the scores exhibited enhancements of 12.9, 15.4, and 15.7 for ROUGE-1/2/L, respectively. Similarly, on the NLPCC dataset, there were improvements of 12.2, 11.7, and 12.7 for ROUGE-1/2/L, respectively. This validation confirms the model's efficacy, offering a robust solution for educational digitalization tasks.