摘要: 顺序聚类算法是一种非常直接和快速的算法,并且不需要提前确定聚类个数。但是当处理海量数据时,时间效率仍然有待提高。TTSAS算法是两个阈值的顺序聚类算法,在此基础上,该文应用三角不等式原理提出了TI_TTSAS算法,该算法避免了冗余的距离计算,实验结果证明,相对于TTSAS算法,TI_TTSAS在速度上有很大程度的提高,数据规模越大,改进效果越明显。并且聚类效果保持了TTSAS算法的准确性。
关键词:
顺序聚类,
三角不等式原理,
两阈值顺序聚类算法,
三角不等式顺序聚类
Abstract:
Sequential algorithm is a straightforward cluster algorithm, and people do not have to provide the number of clusters in advance. However, when faced with large-scale data, the efficiency of the algorithm has need to be improved. Based on two-threshold sequential algorithm scheme(TTSAS), this article presents a new sequential algorithm TI_TTSAS, which avoids unnecessary distance calculations by applying the triangle inequality. Experiments show that the new algorithm is more effective for datasets of more dimensions, and becomes more and more effective as the number of clusters increases. The results keeps the accuracy of TTSAS algorithm.
Key words:
Sequence cluster,
Triangle inequality,
TTSAS,
TI_TTSAS
中图分类号:
陈晓云;王 平;何春霞;冷明伟. 基于三角不等式原理的TTSAS聚类加速算法[J]. 计算机工程, 2006, 32(17): 97-99,1.
CHEN Xiaoyun; WANG Ping; HE Chunxia;LENG Mingwei.
Using Triangle Inequality to Accelerate TTSAS Cluster Algorithm
[J]. Computer Engineering, 2006, 32(17): 97-99,1.