摘要: 分析现有几种中文分词方法,提出一种关键词抽取算法。以词语的权重公式为中心,利用遗传算法训练、优化公式中的参数,得到一组适合中文文本的参数,提高文章子主题划分的精度。实验分析表明,该算法能将抽取系统中的命名实体有效地切分出来,准确完成抽取关键词的工作,并具有一定的通用性。
关键词:
文本分类,
分词技术,
关键词抽取,
遗传算法
Abstract: This paper analyzes several existing Chinese word segmentation methods, brings out a keywords extraction algorithm which according to the weight formula. The parameters are trained and optimized by the means of genetic algorithm, so a set of parameters which are suit for Chinese text are received and the precision of subtopic segmentation is improved. Experiments show that the extraction system can cut out named entity effectually, complete the task of extraction keywords accurately, and this method is current.
Key words:
text categorization,
word segmentation technology,
keywords extraction,
genetic algorithm
中图分类号:
张 虹. 基于自动文本分类的关键词抽取算法[J]. 计算机工程, 2009, 35(12): 145-147.
ZHANG Hong. Keywords Extraction Algorithm Based on Text Self-motion Categorization[J]. Computer Engineering, 2009, 35(12): 145-147.