计算机工程 ›› 2021, Vol. 47 ›› Issue (1): 44-49.doi: 10.19678/j.issn.1000-3428.0056841

• 人工智能与模式识别 • 上一篇    下一篇

融合外部语义知识的中文文本蕴含识别

李世宝, 李贺, 赵庆帅, 殷乐乐, 刘建航, 黄庭培   

  1. 中国石油大学(华东) 海洋与空间信息学院, 山东 青岛 266580
  • 收稿日期:2019-12-09 修回日期:2020-01-17 发布日期:2020-02-11
  • 作者简介:李世宝(1978-),男,副教授、硕士,主研方向为无线通信、自然语言处理;李贺、赵庆帅、殷乐乐,硕士研究生;刘建航、黄庭培,副教授、博士。
  • 基金项目:
    国家自然科学基金(61972417,61872385);中央高校基本科研业务费专项资金(18CX02134A,19CX05003A-4,18CX02137A)。

Chinese Textual Entailment Recognition Fused with External Semantic Knowledge

LI Shibao, LI He, ZHAO Qingshuai, YIN Lele, LIU Jianhang, HUANG Tingpei   

  1. College of Oceanography and Space Informatics, China University of Petroleum(East China), Qingdao, Shandong 266580, China
  • Received:2019-12-09 Revised:2020-01-17 Published:2020-02-11

摘要: 基于神经网络的文本蕴含识别模型通常仅从训练数据中学习推理知识,导致模型泛化能力较弱。提出一种融合外部语义知识的中文知识增强推理模型(CKEIM)。根据知网知识库的特点提取词级语义知识特征以构建注意力权重矩阵,同时从同义词词林知识库中选取词语相似度特征和上下位特征组成特征向量,并将注意力权重矩阵、特征向量与编码后的文本向量相结合融入神经网络的模型训练过程,实现中文文本蕴含的增强识别。实验结果表明,与增强序列推理模型相比,CKEIM在15%、50%和100%数据规模的CNLI训练集下识别准确率分别提升了3.7%、1.5%和0.9%,具有更好的中文文本蕴含识别性能和泛化能力。

关键词: 中文文本蕴含, 自然语言推理, 注意力机制, 双向长短期记忆网络, 知网, 词林

Abstract: The textual entailment recognition model based on neural network learns inference knowledge only from training data,which leads to the weak generalization ability of the model.This paper proposes a Chinese Knowledge Enhanced Inference Model(CKEIM) fused with external semantic knowledge.Based on the features of the HowNet knowledge base,the features of word-level semantic knowledge are extracted to construct an attention weight matrix.At the same time,the semantic similarity features of words and hyponymy features are selected from the CiLin knowledge base of synonyms to form the feature vector.Finally,the attention weight matrix,the feature vector and the encoded text vectors are integrated into the training of the neural network model to implement enhanced recognition of Chinese textual entailment.Experimental results show that compared with the Enhanced Sequential Inference Model(ESIM),CKEIM improves the recognition accuracy by 3.7%,1.5% and 0.9% respectively on CNLI training sets of 15%,50% and 100% data scales,which demonstrates that it has better Chinese textual entailment recognition performance and generalization ability.

Key words: Chinese textual entailment, natural language inference, attention mechanism, Bi-directional Long Short-Term Memory(BiLSTM) network, HowNet, CiLin

中图分类号: