作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (9): 37-39,4. doi: 10.3969/j.issn.1000-3428.2009.09.013

• 软件技术与数据库 • 上一篇    下一篇

基于类间距离参数估计的文本聚类评价方法

郑 军,王 巍,杨 武,杨永田   

  1. (哈尔滨工程大学信息安全中心,哈尔滨 150001)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-05-05 发布日期:2009-05-05

Text Clustering Evaluation Method Based on Parameter Estimation of Distances Between Clusters

ZHENG Jun, WANG Wei, YANG Wu, YANG Yong-tian   

  1. (Information Security Research Center, Harbin Engineering University, Harbin 150001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-05-05 Published:2009-05-05

摘要: 文本聚类评价算法运用统计学当中的参数估计方法,根据类间距离信息对其分布规律中的数字特征进行参数估计。基于估计的结果确定类间距离合理的取值范围,将不合理的聚类进行调整,并通过聚类有效性判断函数最终确认调整结果。该算法有效地提高聚类结果的准确性,并为聚类算法的选择与分析提供一种可行的方法。实验结果证明了其可行性与有效性。

关键词: 聚类分析, 文本聚类, 聚类评价, 极大似然估计

Abstract: The evaluation method of text clustering uses the parameter estimation technique in statistics. It takes parameter estimation to estimate the numerical characteristics of the distances’ distribution according to the data of distances between clusters. According to the results of the estimation, the logical range of the distances between clusters is worked out. And the clusters between which the distances are not in the range should be rectified. But the final result must be validated by cluster validity test function. The method improves the text clustering algorithm’s precision, and also provides a feasible method to choose and compare between different clustering algorithms. The final experiment results indicate that the evaluation method is feasible and effective.

Key words: clustering analysis, text clustering, clustering evaluation, maximum likelihood estimation

中图分类号: