Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2018, Vol. 44 ›› Issue (10): 160-167. doi: 10.19678/j.issn.1000-3428.0048357

Previous Articles     Next Articles

Word Similarity Calculation Method Based on Path and CiLin Coding

WANG Songsong,GAO Weixun,XU Yifan   

  1. College of Information and Mechatronic Engineering,Shanghai Normal University,Shanghai 200134,China
  • Received:2017-08-14 Online:2018-10-15 Published:2018-10-15

基于路径与词林编码的词语相似度计算方法

王松松,高伟勋,徐逸凡   

  1. 上海师范大学 信息与机电工程学院,上海 200134
  • 作者简介:王松松(1991—),男,硕士研究生,主研方向为自然语言处理、数据挖掘;高伟勋(通信作者),高级工程师、博士;徐逸凡,硕士研究生。

Abstract: The existed similarity calculation methods of words are mainly focus on the path structure of words and consider less about the semantic information of words in detail,which lead to inaccurate calculation results.Aiming at this problem,an improved semantic word similarity calculation method is proposed.The CiLin coding and path structure are combined to calculate the similarity between CiLin coding,while using local sensitive Hash algorithm and Hamming distance.Experimental results show that the proposed method can make Pearson correlation coefficients achieve 0.897 4 and 0.866 8 on the MC data set and the RG data set respectively.It is more accurate than the traditional path-based and depth-based calculation methods.

Key words: synonym, path structure, coding, word similarity, local sensitive Hash algorithm, semantic

摘要: 现有词语相似度计算方法主要针对词语的路径结构进行计算,较少深入考虑词语的语义信息,导致计算结果不够准确。针对该问题,提出一种改进的词语语义相似度计算方法。将词语的词林编码与路径结构相结合,同时利用局部敏感哈希算法和海明距离计算词林编码之间的相似度。在MC和RG数据集上的实验结果表明,该方法可使皮尔逊相关系数分别达到0.897 4和0.866 8,较传统基于路径和深度的计算方法准确性更高。

关键词: 同义词, 路径结构, 编码, 词语相似度, 局部敏感哈希算法, 语义

CLC Number: