作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于词向量的中文词汇蕴涵关系识别

张志昌,周慧霞,姚东任,鲁小勇   

  1. (西北师范大学计算机科学与工程学院,兰州 730070)
  • 收稿日期:2015-08-17 出版日期:2016-02-15 发布日期:2016-01-29
  • 作者简介:张志昌(1976-),男,副教授、博士,主研方向为自然语义处理、Web挖掘;周慧霞、姚东任,硕士研究生;鲁小勇,工程师。
  • 基金资助:
    国家自然科学基金资助项目 (61163039,61163036,61363058);西北师范大学青年教师科研能力提升计划基金资助项目(NWNU-LKQN-10-2,NWNU-LKQN-12-23)。

Recognition of Chinese Lexical Entailment Relation Based on Word Vector

ZHANG Zhichang,ZHOU Huixia,YAO Dongren,LU Xiaoyong   

  1. (School of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2015-08-17 Online:2016-02-15 Published:2016-01-29

摘要: 英文词汇蕴涵关系识别已有较多研究,并提出许多识别模型,但针对中文的词汇蕴涵关系获取则鲜有研究。为此,提出一种中文词汇蕴涵关系识别方法。利用词向量技术,在中文维基百科语料上进行训练,将词汇表示为词向量,设计各种基于词向量的分类特征,训练得到可用于名词词汇蕴涵关系分类的支持向量机分类模型。实验结果表明,与传统的余弦相似度方法相比,该方法以及设计的各种分类特征在词汇蕴涵关系识别方面具有明显优势。

关键词: 文本蕴涵, 词汇蕴涵, 词向量, 蕴涵特征, 支持向量机

Abstract: Automatic recognition of English lexical entailment relation has many researches,and many recognition models are presented.But study on Chines lexical entailment is not sufficient while there have many studies on English lexical entailment from different points of view.This paper proposes a recognition method of Chinese lexical entailment relation based on word vector,it uses word vector technology on Chinese Wikipedia corpora,and word is represented as word vector.Word vector based classification features are designed,and Support Vector Machine(SVM) model for Chinese noun lexical entailment classification is trained on manually created Chinese lexical entailment data set.Experimental results show that the method and designed classification features have good performance on lexical entailment relation recognition compared with traditional cosine similarity method.

Key words: textual entailment, lexical entailment, word vector, entailment feature, Support Vector Machine(SVM)

中图分类号: