摘要: 将面向大规模数据集的基于网格重心的分箱核密度估计理论扩展到数据流应用领域,在引入密度衰减技术的基础上,指出对于演化数据流以网格重心取代网格离散数据点集合的分箱核密度估计方法的近似误差是可控的,由此构造多维演化数据流核密度估计算法。实验结果表明,该方法在保持足够计算精度的同时能够精确捕获数据流的实时演化行为。
关键词:
核密度估计,
数据流,
演化,
分箱规则,
网格
Abstract: The binned density estimation which is designed for very large datasets and based on the gravity center of the data points in a grid is extended to data stream applications. When introducing a density decaying scheme, it is revealed that the closeness of such estimators which substitutes the center of a grid with the gravity center of the data points is bounded. As a result, an algorithm for multidimensional evolution data streams is proposed. Experimental results show the algorithm can capture the evolving behaviors of the data stream in real time with enough accuracy.
Key words:
kernel density estimation,
data stream,
evolution,
binning rule,
grid
中图分类号:
罗剑. 多维演化数据流核密度估计[J]. 计算机工程, 2011, 37(17): 46-48,60.
LUO Jian. Kernel Density Estimation for Multidimensional Evolution Data Stream[J]. Computer Engineering, 2011, 37(17): 46-48,60.