作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (10): 160-167. doi: 10.19678/j.issn.1000-3428.0048357

• 人工智能及识别技术 • 上一篇    下一篇

基于路径与词林编码的词语相似度计算方法

王松松,高伟勋,徐逸凡   

  1. 上海师范大学 信息与机电工程学院,上海 200134
  • 收稿日期:2017-08-14 出版日期:2018-10-15 发布日期:2018-10-15
  • 作者简介:王松松(1991—),男,硕士研究生,主研方向为自然语言处理、数据挖掘;高伟勋(通信作者),高级工程师、博士;徐逸凡,硕士研究生。

Word Similarity Calculation Method Based on Path and CiLin Coding

WANG Songsong,GAO Weixun,XU Yifan   

  1. College of Information and Mechatronic Engineering,Shanghai Normal University,Shanghai 200134,China
  • Received:2017-08-14 Online:2018-10-15 Published:2018-10-15

摘要: 现有词语相似度计算方法主要针对词语的路径结构进行计算,较少深入考虑词语的语义信息,导致计算结果不够准确。针对该问题,提出一种改进的词语语义相似度计算方法。将词语的词林编码与路径结构相结合,同时利用局部敏感哈希算法和海明距离计算词林编码之间的相似度。在MC和RG数据集上的实验结果表明,该方法可使皮尔逊相关系数分别达到0.897 4和0.866 8,较传统基于路径和深度的计算方法准确性更高。

关键词: 同义词, 路径结构, 编码, 词语相似度, 局部敏感哈希算法, 语义

Abstract: The existed similarity calculation methods of words are mainly focus on the path structure of words and consider less about the semantic information of words in detail,which lead to inaccurate calculation results.Aiming at this problem,an improved semantic word similarity calculation method is proposed.The CiLin coding and path structure are combined to calculate the similarity between CiLin coding,while using local sensitive Hash algorithm and Hamming distance.Experimental results show that the proposed method can make Pearson correlation coefficients achieve 0.897 4 and 0.866 8 on the MC data set and the RG data set respectively.It is more accurate than the traditional path-based and depth-based calculation methods.

Key words: synonym, path structure, coding, word similarity, local sensitive Hash algorithm, semantic

中图分类号: