作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (17): 46-48,60. doi: 10.3969/j.issn.1000-3428.2011.17.014

• 软件技术与数据库 • 上一篇    下一篇

多维演化数据流核密度估计

罗 剑   

  1. (浙江经济职业技术学院数字信息技术学院,杭州 310018)
  • 收稿日期:2011-03-17 出版日期:2011-09-05 发布日期:2011-09-05
  • 作者简介:罗 剑(1971-),男,高级工程师、副教授、硕士,主研方向:数据库知识发现,网格计算

Kernel Density Estimation for Multidimensional Evolution Data Stream

LUO Jian   

  1. (School of Digital Information Technology, Zhejiang Technology Institute of Economy, Hangzhou 310018, China)
  • Received:2011-03-17 Online:2011-09-05 Published:2011-09-05

摘要: 将面向大规模数据集的基于网格重心的分箱核密度估计理论扩展到数据流应用领域,在引入密度衰减技术的基础上,指出对于演化数据流以网格重心取代网格离散数据点集合的分箱核密度估计方法的近似误差是可控的,由此构造多维演化数据流核密度估计算法。实验结果表明,该方法在保持足够计算精度的同时能够精确捕获数据流的实时演化行为。

关键词: 核密度估计, 数据流, 演化, 分箱规则, 网格

Abstract: The binned density estimation which is designed for very large datasets and based on the gravity center of the data points in a grid is extended to data stream applications. When introducing a density decaying scheme, it is revealed that the closeness of such estimators which substitutes the center of a grid with the gravity center of the data points is bounded. As a result, an algorithm for multidimensional evolution data streams is proposed. Experimental results show the algorithm can capture the evolving behaviors of the data stream in real time with enough accuracy.

Key words: kernel density estimation, data stream, evolution, binning rule, grid

中图分类号: