摘要: 为提高中文关键字的提取准确率,提出一种基于竞争学习网络的中文关键字提取算法。对文章进行分词,得到单个词组或短语,视其为单个神经元,将神经元输入竞争学习网络的输入层,通过竞争层上神经元的相互竞争,获得一个或几个活跃的神经元,使用合并权值及聚类分析方法得到文章的关键字。实验结果表明,该算法提取关键字的平均命中率高于词频-逆文档频率算法和传统的词频算法,鲁棒性较好。
关键词:
关键字提取,
平均命中率,
竞争学习网络,
神经元,
输入层,
竞争层
Abstract: To solve this problem about the accuracy of the present Chinese keyword extraction algorithm, this paper presents a new keyword extraction algorithm based on competitive learning network. The algorithm adopts the method that it takes the divided word which comes from the Chinese article as the single neuron. And it can get one or more active neurons after these neurons are input the input layer and compete with each other on the competition layer. The keywords of the Chinese article are obtained through merging the weights and clustering analysis. Experimental results show that the hit rate of extracting keywords with this algorithm is higher than the algorithm of Term Frequency-inverse Document Frequency(TF-IDE) and the traditional algorithm named Term Frequency(TF), and has a good robustness.
Key words:
keyword extraction,
average hit rate,
competitive learning network,
neuron,
input layer,
competitive layer
中图分类号:
沈学利, 程宇伟. 基于竞争学习网络的中文关键字提取算法[J]. 计算机工程, 2013, 39(2): 207-210.
CHEN Hua-Li, CHENG Yu-Wei. Chinese Keyword Extraction Algorithm Based on Competitive Learning Network[J]. Computer Engineering, 2013, 39(2): 207-210.