作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (10): 110-118. doi: 10.19678/j.issn.1000-3428.0068669

• 人工智能与模式识别 • 上一篇    下一篇

教育领域下多维度特征命名实体识别方法

任义, 苏博*(), 袁帅   

  1. 沈阳建筑大学计算机科学与工程学院, 辽宁 沈阳 110168
  • 收稿日期:2023-10-24 出版日期:2024-10-15 发布日期:2024-03-06
  • 通讯作者: 苏博
  • 基金资助:
    国家自然科学基金(62073227); 辽宁省教育厅基金(LJKZ0581); 辽宁省教育厅基金(LJKZ0584)

Multidimensional Feature Named Entity Recognition Method in Education Domain

REN Yi, SU Bo*(), YUAN Shuai   

  1. School of Computer Science and Engineering, Shenyang Jianzhu University, Shenyang 110168, Liaoning, China
  • Received:2023-10-24 Online:2024-10-15 Published:2024-03-06
  • Contact: SU Bo

摘要:

信息技术的发展与进步促使“互联网+教育”成为目前教育领域的研究热点, 教育教学的各个环节都在向智能化的方向发展。中学数学的命名实体识别(NER)任务的研究, 可为后续构建中学数学学科知识图谱及自动问答等任务奠定基础, 进而满足中学生个性化知识获取的需求, 助力新型智能化教育体系的构建。目前中学数学知识语义复杂, 其NER和研究数据较少, 且在当前主流模型特征提取任务中容易忽略掉部分局部特征。为解决该领域的实体识别困难问题, 以自建的中学数学知识语料库为研究对象, 提出一种融合多头注意力的多维度特征NER方法。该方法首先采用BERT进行文本表征预训练得到词向量, 接着引入对抗训练对每个嵌入向量进行扰动, 将得到的对抗样本和嵌入向量传送到多维度特征提取层进行特征提取, 再将输出的特征进行拼接, 通过多头注意力机制进行动态融合, 最终经过条件随机场(CRF)修正后输出。实验结果表明, 该方法在自建Educ数据集上的识别准确率、召回率以及F1值分别达到96.68%、97.71%和97.19%, 证明了该方法在中学数学知识实体识别上的有效性。

关键词: 命名实体识别, 教育领域, 对抗训练, 多维度特征提取, 多头注意力机制

Abstract:

The development and progress of information technology have resulted in extensive investigations into ″Internet + Education″ in the field of education, and all aspects of education and teaching are being developed in the direction of intelligence. The study of Named Entity Recognition (NER) in secondary school mathematics can provide a foundation for the subsequent construction of secondary school mathematics knowledge mapping and automatic question-and-answer tasks to fulfill the demands of secondary school students for personalized knowledge acquisition and facilitate the construction of a new intelligent education system. Currently, owing to the semantic complexity of secondary school mathematics knowledge, its NER and research data are insufficient, and the current mainstream model for feature extraction disregards some local features. To solve the challenges of entity recognition in this field, a multidimensional feature NER method incorporating multihead attention is proposed using a self-constructed corpus of secondary school mathematics knowledge. First, the method adopts Bidirectional Encoder Representations from Transformers (BERT) for pre-training text representations to obtain word vectors. Subsequently, this method introduces adversarial training to perturb each embedding vector and then transmits the obtained adversarial samples and embedding vectors to the multidimensional feature extraction layer for feature extraction. Next, it splices the output features, dynamically fuses them via the multihead attention mechanism, and finally outputs them after correction by a Conditional Random Field (CRF). Experimental results show that the accuracy, recall, and F1 value of this method for recognizing the self-constructed Educ dataset are 96.68%, 97.71%, and 97.19%, respectively, thus demonstrating its effectiveness in recognizing mathematical knowledge entities in secondary schools.

Key words: Named Entity Recognition(NER), educational domain, adversarial training, multidimensional feature extraction, multi-head attention mechanism