摘要: 词语语义相似度计算在信息检索、文本聚类、语义消歧等方面有着广泛的应用。基于《知网》提出一种词语语义相似度算法。设计一种义原分类,将义原分为第一基本义原、其他基本义原和间接义原3类。与以往义项相似度计算方法不同,根据不同类义原对义项相似度影响的大小,分别使用不同的义原相似度计算方法进行义项相似度的计算。利用词语之间第一基本义原相似度最高的义项组合进行词语语义相似度计算,剔除相似度较低的组合对词语语义相似度结果的影响。实验结果表明,该算法能有效提高运算效率和精确度。
关键词:
义原,
义项,
词语语义相似度,
知识描述语言
Abstract: The word semantic similarity computation is widely used in information retrieval,text clustering,word sense disambiguation,etc.This paper proposes an improved method of word semantic similarity computation based on HowNet.A new sememe classification is proposed,and sememe is divided into first basic sememe,other basic sememe and indirect sememe.A new variable coefficient of homonym similarity computation is proposed according to the effect of different sememes.Unlike previous sense similarity calculation method,according to the influence of different sememes to sense similarity calculation,different sememes similarity calculation method of sense similarity is proposed in this paper.It uses the highest item combination of the first basic sememe to calculate the word semantic similarity and removes other combinations with lower similarity.Experimental results show that the improved method effectively improves computational efficiency and precision of word semantic similarity.
Key words:
sememe,
homonym,
word semantic similarity,
knowledge representation language
中图分类号:
王小林,王东,杨思春,邰伟鹏,郑啸. 基于《知网》的词语语义相似度算法[J]. 计算机工程, 2014, 40(12): 177-181.
WANG Xiaolin,WANG Dong,YANG Sichun,TAI Weipeng,ZHENG Xiao. Word Semantic Similarity Algorithm Based on HowNet[J]. Computer Engineering, 2014, 40(12): 177-181.