计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于2008版《知网》的词语相似度计算方法

魏韡1,2,向阳2   

  1. (1.井冈山大学电子与信息工程学院流域生态与地理环境监测国家测绘地理信息局重点实验室,江西 吉安 343009; 2.同济大学电子与信息工程学院,上海 201804)
  • 收稿日期:2014-08-04 出版日期:2015-09-15 发布日期:2015-09-15
  • 作者简介:魏韡(1983-),男,讲师、博士研究生,主研方向:自然语言处理,人工智能;向阳,教授、博士生导师。
  • 基金项目:
    国家自然科学基金资助项目(61363014,71171148);江西省自然科学基金资助项目(20151BAB207016)。

Method of Word Similarity Computation Based on HowNet 2008

WEI Wei  1,2,XIANG Yang  2   

  1. (1.Key Laboratory of Watershed Ecology and Geographical Environment Monitoring, College of Electronics and Information Engineering,Jinggangshan University,Ji’an 343009,China; 2.College of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)
  • Received:2014-08-04 Online:2015-09-15 Published:2015-09-15

摘要: 词语相似度的计算是自然语言处理领域的重要问题,在机器翻译、信息检索、文本分类等领域有广泛的应用。分析和利用新版语义词典2008版《知网》,从概念的主类义原和概念的特征描述2个方面综合计算词语相似度。运用义原树的树形层次结构,得到义原的深度信息量,再考虑义原的路径计算得到义原相似度。通过层次特征类型匹配计算概念特征描述的相似度。综合主类义原相似度、概念特征描述相似度以及义原之间的对义、反义关系计算得到词语相似度。实验结果表明,该方法得到的词语相似度计算结果与人的主观认识趋于一致。

关键词: 词语相似度, 2008版《知网》, 义原, 深度信息量, 路径, 特征描述

Abstract: Word similarity computing is a key issue in natural language processing,which is widely used in machine translation,information retrieval and text classification.Based on lexical taxonomy new HowNet(2008),this paper proposes a new method to analyze and compute Chinese word similarity from two dimensions:the main sememe of the concept and the concept characteristic description of the concept.In this paper,the depth information is obtained by using the sememe tree structure,then the sememe similarity is computed by taking into account the hierarchical path of the sememe.Computing the similarity between two concept characteristic descriptions is based on characteristic type mapping.Word similarity is computed based on the sememe similarity,the concept characteristic descriptions similarity and the antonym information of sememe.Experimental results show that the calculating results of word similarity by this method are more in line with subjective cognition of the people.

Key words: word similarity, HowNet 2008, sememe, depth information quantity, path, characteristic description

中图分类号: