作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (12): 145-147. doi: 10.3969/j.issn.1000-3428.2009.12.051

• 人工智能及识别技术 • 上一篇    下一篇

基于自动文本分类的关键词抽取算法

张 虹   

  1. (潍坊学院计算机与通信工程学院,潍坊 261061)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-06-20 发布日期:2009-06-20

Keywords Extraction Algorithm Based on Text Self-motion Categorization

ZHANG Hong   

  1. (School of Computer and Communication Engineering, Weifang University, Weifang 261061)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-06-20 Published:2009-06-20

摘要: 分析现有几种中文分词方法,提出一种关键词抽取算法。以词语的权重公式为中心,利用遗传算法训练、优化公式中的参数,得到一组适合中文文本的参数,提高文章子主题划分的精度。实验分析表明,该算法能将抽取系统中的命名实体有效地切分出来,准确完成抽取关键词的工作,并具有一定的通用性。

关键词: 文本分类, 分词技术, 关键词抽取, 遗传算法

Abstract: This paper analyzes several existing Chinese word segmentation methods, brings out a keywords extraction algorithm which according to the weight formula. The parameters are trained and optimized by the means of genetic algorithm, so a set of parameters which are suit for Chinese text are received and the precision of subtopic segmentation is improved. Experiments show that the extraction system can cut out named entity effectually, complete the task of extraction keywords accurately, and this method is current.

Key words: text categorization, word segmentation technology, keywords extraction, genetic algorithm

中图分类号: