作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (01): 1-4. doi: 10.3969/j.issn.1000-3428.2012.01.001

• 专栏 •    下一篇

基于语义的中文文本关键词提取算法

王立霞1,2,淮晓永1   

  1. (1. 中国科学院软件研究所基础软件国家工程研究中心,北京 100190;2. 中国科学院研究生院,北京 100049)
  • 收稿日期:2011-07-05 出版日期:2012-01-05 发布日期:2012-01-05
  • 作者简介:王立霞(1986-),女,硕士研究生,主研方向:中文信息处理,数据挖掘;淮晓永,高级工程师
  • 基金资助:

    国家自然科学基金资助项目(90920010);国家“863”计划基金资助项目(2008AA01Z145)

Semantic-based Keyword Extraction Algorithm for Chinese Text

WANG Li-xia 1,2, HUAI Xiao-yong 1   

  1. (1. National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 2. Graduate University of Chinese Academy of Sciences, Beijing 100049, China)
  • Received:2011-07-05 Online:2012-01-05 Published:2012-01-05

摘要:

为克服传统关键词提取算法局限于字面匹配、缺乏语义理解的缺点,提出一种基于语义的中文文本关键词提取(SKE)算法。将词语语义特征融入关键词提取过程中,构建词语语义相似度网络并利用居间度密度度量词语语义关键度。实验结果表明,与基于统计特征的关键词提取算法相比,SKE算法提取的关键词能体现文档的主题,更符合人们的感知逻辑,且算法性能较优。

关键词: 关键词提取, 语义相似度, 词语语义相似度网络, 居间度, 中文文本

Abstract:

In order to overcome the limitation of literal matching and lacking semantic concept of the traditional keyword extraction algorithm, this paper presents a Semantic-based Keyword Extraction(SKE) algorithm for Chinese text. It uses semantic feature in the keyword extraction process and constructs word semantic similarity network and uses betweenness centrality density. Experimental results show that compared with the statistic based keyword extraction algorithm, the keywords SKE algorithm extracted are more reasonable and can represent more information of the document’s topic, and the SKE algorithm has a better performance.

Key words: keyword extraction, semantic similarity, word semantic similarity network, betweenness centrality, Chinese text

中图分类号: