基于词向量的中文词汇蕴涵关系识别

doi:10.3969/j.issn.1000-3428.2016.02.031

计算机工程

基于词向量的中文词汇蕴涵关系识别

张志昌,周慧霞,姚东任,鲁小勇

(西北师范大学计算机科学与工程学院,兰州 730070)

收稿日期:2015-08-17 出版日期:2016-02-15 发布日期:2016-01-29
作者简介:张志昌(1976－),男,副教授、博士,主研方向为自然语义处理、Web挖掘;周慧霞、姚东任,硕士研究生;鲁小勇,工程师。
基金资助:
国家自然科学基金资助项目 (61163039,61163036,61363058);西北师范大学青年教师科研能力提升计划基金资助项目(NWNU-LKQN-10-2,NWNU-LKQN-12-23)。

Recognition of Chinese Lexical Entailment Relation Based on Word Vector

ZHANG Zhichang,ZHOU Huixia,YAO Dongren,LU Xiaoyong

(School of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)

Received:2015-08-17 Online:2016-02-15 Published:2016-01-29

摘要/Abstract

摘要： 英文词汇蕴涵关系识别已有较多研究,并提出许多识别模型,但针对中文的词汇蕴涵关系获取则鲜有研究。为此,提出一种中文词汇蕴涵关系识别方法。利用词向量技术,在中文维基百科语料上进行训练,将词汇表示为词向量,设计各种基于词向量的分类特征,训练得到可用于名词词汇蕴涵关系分类的支持向量机分类模型。实验结果表明,与传统的余弦相似度方法相比,该方法以及设计的各种分类特征在词汇蕴涵关系识别方面具有明显优势。

关键词: 文本蕴涵, 词汇蕴涵, 词向量, 蕴涵特征, 支持向量机

Abstract: Automatic recognition of English lexical entailment relation has many researches,and many recognition models are presented.But study on Chines lexical entailment is not sufficient while there have many studies on English lexical entailment from different points of view.This paper proposes a recognition method of Chinese lexical entailment relation based on word vector,it uses word vector technology on Chinese Wikipedia corpora,and word is represented as word vector.Word vector based classification features are designed,and Support Vector Machine(SVM) model for Chinese noun lexical entailment classification is trained on manually created Chinese lexical entailment data set.Experimental results show that the method and designed classification features have good performance on lexical entailment relation recognition compared with traditional cosine similarity method.

Key words: textual entailment, lexical entailment, word vector, entailment feature, Support Vector Machine(SVM)

中图分类号:

TP391

张志昌,周慧霞,姚东任,鲁小勇. 基于词向量的中文词汇蕴涵关系识别[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2016.02.031.

ZHANG Zhichang,ZHOU Huixia,YAO Dongren,LU Xiaoyong. Recognition of Chinese Lexical Entailment Relation Based on Word Vector[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2016.02.031.

http://www.ecice06.com/CN/Y2016/V42/I2/169

参考文献

参考文献［1］Androutsopoulos I,Malakasiotis P.A Survey of Paraphrasing and Textual Entailment Methods［J］.Journal of Artificial Intelligence Research,2010,38(1):135-187. ［2］袁毓林,王明华.文本蕴涵的推理模型与识别模型［J］.中文信息学报,2010,24(2):3-13. ［3］盛雅琦,张晗,吕晨,等.基于混合主题模型的文本蕴涵识别［J］.计算机工程,2015,41(5):180-184. ［4］Shnarch E,Dagan I.Lexical Entailment and Its Extraction from Wikipedia［D］.Israel,Jaffa:Bar-Ilan University,2008. ［5］Kouylekov M,Magnini B.Building a Large-scale Reposi-tory of Textual Entailment Rules［C］//Pro-ceedings of the 5th International Conference on Language Resources and Evaluation.Genoa,Italy:［s.n.］,2006:2437-2440. ［6］Weeds J,Weir D.A General Framework for Distribu-tional Similarity［C］//Proceedings of EMNLP’03.Sapporo,Japan:［s.n.］,2003:81-88. ［7］Weeds J,Weir D,McCarthy D.Characterizing Measures of Lexical Distributional Similarity［C］//Proceedings of the 20th International Conference on Computational Linguistics).Geneva,Switzerland:［s.n.］,2004:1015-1021. ［8］Lin Dekang.Automatic Retrieval and Clustering of Similar Words［C］//Proceedings of COLING-ACL’98.Montreal,Canada:［s.n.］,1998:768-774. ［9］何娟,高志强,陆青健,等.基于词汇相似度的元素级本体匹配［J］.计算机工程,2006,32(16):191-193. ［10］Szpektor I,Dagan I.Learning Entailment Rules for Unary Templates［C］//Proceedings of the 22nd Inter-national Conference on Computational Linguistics.Manchester,UK:［s.n.］,2008:849-856. ［11］Kotlerman L,Dagan I,Szpektor I,et al.Directional Distributional Similarity for Lexical Inference［J］.Natural Language Engineering,2010,16(4):359-389. ［12］Kouylekov M,Mehdad Y,Negri M.Mining Wikipedia for Large-scale Repositories of Context-sensitive Entail-ment Rules［C］//Proceedings of the 7th Conference on International Language Resources and Evaluation.Washington D.C.,USA:IEEE Press,2010:3550-3553. ［13］Baroni M,Bernardi R.Entailment Above the Word Level in Distributional Semantics［C］//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics.Avignon,France:［s.n.］,2012:23-32. ［14］Weisman H,Berant J.Learning Verb Inference Rules from Linguistically Motivated Evidence［C］//Proceed-ings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Jeju Island,Korea:［s.n.］,2012:194-204. ［15］Turney P D,Mohammad S M.Experiments with Three Approaches to Recognizing Lexical Entailment［J］.Natural Language Engineering,2015,21(3):437-476. ［16］Hinton G E.Learning Distributed Representations of Concepts［C］//Proceedings of the 8th Annual Con-ference of the Cognitive Science Society.Hillsdale,USA:［s.n.］,1986:1-12. 编辑索书志

[1]	李军怀, 陈苗苗, 王怀军, 崔颖安, 张爱华. 基于ALBERT-BGRU-CRF的中文命名实体识别方法[J]. 计算机工程, 2022, 48(6): 89-94,106.
[2]	王志江, 秦品乐, 柴锐, 武峰, 程一彤, 史玥. 基于深度学习的牙齿嵌塞自动判别方法[J]. 计算机工程, 2022, 48(4): 307-313.
[3]	李冉冉, 刘大明, 刘正, 常高祥. 融合笔画特征的胶囊网络文本分类[J]. 计算机工程, 2022, 48(3): 69-73,80.
[4]	雷恒林, 古兰拜尔·吐尔洪, 买日旦·吾守尔, 曾琪. 基于Hellinger距离与词向量的终身机器学习主题模型[J]. 计算机工程, 2022, 48(11): 89-95.
[5]	彭俊利, 谷雨, 张震, 耿小航. 融合单词贡献度与Word2Vec词向量的文档表示[J]. 计算机工程, 2021, 47(4): 62-67.
[6]	王海, 翁晨傲, 李克, 骆曦. 一种面向基站扇区方向角估计的改进SVM算法[J]. 计算机工程, 2021, 47(4): 120-126.
[7]	张冰玉, 潘晴, 田妮莉, Everett Xiaolin Wang. 一种基于多重特征融合的信源个数估计方法[J]. 计算机工程, 2021, 47(4): 115-119,126.
[8]	连晓伟, 马垚, 陈永乐, 张壮壮, 王建华. 基于载荷特征与统计特征的Shodan流量识别[J]. 计算机工程, 2021, 47(1): 117-122.
[9]	李俊, 吕学强. 融合BERT语义加权与网络图的关键词抽取方法[J]. 计算机工程, 2020, 46(9): 89-94.
[10]	陈俊月, 郝文宁, 张紫萱, 唐新德, 康睿智, 莫斐. 基于改进句子相似度算法的释义识别研究[J]. 计算机工程, 2020, 46(9): 76-82.
[11]	袁哲明, 杨晶晶, 陈渊. 基于最大信息系数与冗余分摊的特征选择方法[J]. 计算机工程, 2020, 46(8): 101-105.
[12]	王义, 沈洋, 戴月明. 基于细粒度多通道卷积神经网络的文本情感分析[J]. 计算机工程, 2020, 46(5): 102-108.
[13]	付子爔, 徐洋, 吴招娣, 许丹丹, 谢晓尧. 基于增量学习的SVM-KNN网络入侵检测方法[J]. 计算机工程, 2020, 46(4): 115-122.
[14]	张瑞, 陈红卫. 基于特征优化与SVPSO的工控入侵检测[J]. 计算机工程, 2020, 46(4): 19-25.
[15]	鲁淑霞, 蔡莲香, 张罗幻. 基于动量加速零阶减小方差的鲁棒支持向量机[J]. 计算机工程, 2020, 46(12): 88-95,104.

选择文件类型/文献管理软件名称

选择包含的内容

基于词向量的中文词汇蕴涵关系识别

Recognition of Chinese Lexical Entailment Relation Based on Word Vector

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于词向量的中文词汇蕴涵关系识别

Recognition of Chinese Lexical Entailment Relation Based on Word Vector

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价