作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (06): 191-194. doi: 10.3969/j.issn.1000-3428.2007.06.067

• 人工智能及识别技术 • 上一篇    下一篇

汉语词语语义相似度计算研究

夏 天   

  1. (中国人民大学信息资源管理学院,北京 100872)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-03-20 发布日期:2007-03-20

Study on Chinese Words Semantic Similarity Computation

XIA Tian   

  1. (School of Information Resource Management, Renmin University of China, Beijing 100872)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-03-20 Published:2007-03-20

摘要: 汉语词语的语义相似度计算是中文信息处理中的一个关键问题。该文提出了一种基于知网、面向语义、可扩展的相似度计算新方法,该方法从信息论的角度出发,定义了知网义原间的相似度计算公式,通过对未登录词进行概念切分和语义自动生成,解决了未登录词无法参与语义计算的难题,实现了任意词语在语义层面上的相似度计算。针对同义词词林的实验结果表明,该方法的准确率比现有方法高出近15个百分点。

关键词: 词语相似度, 知网, 概念, 义原

Abstract: Similarity computation of Chinese words is a key problem in Chinese information processing. This paper proposes a new method on similarity computation which is based on Hownet, geared to semantic and could be expanded. The new method defines a similarity computation formula among Hownet’s sememes according to information theory, finds a way out of the difficulty that OOV words cannot participate in semantic computation by implementing concept segmentation and automatic semantic production to OOV words, and realizes the similarity computation on the semantic level among arbitrary words finally. Experimental result of CILIN indicates that the accuracy rate of the new method is nearly 15% higher than present ones.

Key words: Words similarity, Hownet, Concept, Sememe