作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (19): 93-95. doi: 10.3969/j.issn.1000-3428.2010.19.032

• 软件技术与数据库 • 上一篇    下一篇

基于同义词链的中文关键词提取算法

张颖颖,谢 强,丁秋林   

  1. (南京航空航天大学信息科学与技术学院,南京 210016)
  • 出版日期:2010-10-05 发布日期:2010-09-27
  • 作者简介:张颖颖(1984-),女,硕士研究生,主研方向:知识工程,信息系统集成,人机交互;谢 强,副教授、博士;丁秋林,教授、博士生导师

Chinese Keyword Extraction Algorithm Based on Synonym Chains

ZHANG Ying-ying, XIE Qiang, DING Qiu-lin   

  1. (College of Information Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)
  • Online:2010-10-05 Published:2010-09-27

摘要: 针对传统中文关键词提取对语义和同义词的不重视而导致的精确度和召回率低的问题,提出基于同义词链的中文关键词提取算法。利用上下文窗口和消歧算法解决词语在上下文中的语义问题,利用文档中的同义词构建同义词链,简化候选词的选取。根据同义词链的特征,得到相应的权重计算公式,对候选词进行过滤。实验结果表明,该算法在同义词较多的文档中精确度和召回率有较大的提高,平均性能也有明显改善。

关键词: 关键词提取, 同义词链, 语义, 消歧

Abstract: To solve the problem of low precision rate and recall rate in the traditional Chinese keyword extraction resulted from indifference of semantic and synonym, Chinese keyword extraction algorithm based on synonym chains is proposed. In the algorithm, the problem of word semantic in the context is solved by using the word of context window and word sense disambiguation algorithm. Synonym chains are built by using synonym of the document which simplifies the selection of candidate words, and the weight formula of keyword which can filter candidate word is brought out by the characteristics of synonym chains. Experimental results show that the proposed algorithm has more precision rate and recall rate in the document with much more synonym, and the average performance can be obviously improved.

Key words: keyword extraction, synonym chains, semantic, disambiguation

中图分类号: