摘要: 现有的树聚类算法在树数据库实时更新后无法及时更新已有的聚类结果。为此,建立一种支持实时增量更新的闭子树聚类模型,以解决闭子树的增量聚类问题并提高聚类效率。针对树的半结构化特性,将结点语义和结点-边的结构特性结合在一起,提出一种准确率更高的树相似性度量方法,在此基础上,利用CTUM算法、TC算法和UTC算法,分别解决闭子树增量更新、聚类和增量聚类等问题。实验结果表明,该算法具有较高的运行效率和聚类准确率。
关键词:
聚类算法,
数据挖掘,
闭子树,
增量更新
Abstract: In real application environment, when tree database carries out live updating, the present tree-cluster algorithm can not update existing cluster result. In consideration of the semi-structured characteristics and the lower accuracy rate of similarity measurement of tree, this paper puts forward a similarity measuring method of combining knot semantics and structure feature of one side of knot. On that basis, it brings forth Closed Tree Update Mining (CTUM) algorithm, Tree Cluster(TC) algorithm and Update Tree Cluster(UTC) algorithm, which can separately solve problems of increment updating, clustering and increment clustering of close subtree. Experimental result proves that the novel algorithms are efficient and practicable to own higher executing efficiency and better cluster accuracy rate.
Key words:
clustering algorithm,
data mining,
closed subtree,
incremental update
中图分类号:
黄伟, 郭鑫, 周清平. 支持实时增量更新的闭子树聚类算法[J]. 计算机工程, 2011, 37(24): 25-27.
HUANG Wei, GUO Xin, ZHOU Qing-Beng. Closed Subtree Clustering Algorithm Supporting Real-time Incremental Update[J]. Computer Engineering, 2011, 37(24): 25-27.