摘要: 在主动选取成对约束方法的基础上,提出一种基于成对约束的主动半监督文本聚类方法。利用潜在语义索引方法对文本特征空间进行降维,在聚类过程中,采用构造的约束选取方法主动地选取成对约束信息,并利用选取的成对约束信息指导文本聚类。实验结果表明,该方法能利用少量的监督信息提高文本聚类的分类准确率。
关键词:
文本聚类,
半监督聚类,
潜在语义索引,
成对约束
Abstract: An active method which can effectively select pairwise constraints is constructed. By using this method, an active semi-supervised text clustering method based on pairwise constraints is proposed. Latent Semantic Index(LSI) is used to reduce the dimension of text features. In the clustering process, it uses the proposed method to actively select pairwise constraints, and then uses these pairwise constraints to steer the clustering process towards an appropriate partition. Experimental results show that the proposed method can effectively improve the text clustering results by using a small amount of pairwise constraints.
Key words:
text clustering,
semi-supervised clustering,
Latent Semantic Index(LSI),
pairwise constraints
中图分类号:
钟将, 刘龙海, 梁传伟. 基于成对约束的主动半监督文本聚类[J]. 计算机工程, 2011, 37(13): 183-186.
ZHONG Jiang, LIU Long-Hai, LIANG Chuan-Wei. Active Semi-supervised Text Clustering Based on Pairwise Constraints[J]. Computer Engineering, 2011, 37(13): 183-186.