摘要: 数据流挖掘要求算法能快速地响应、占用少量内存和自适应概念漂移。根据以上要求提出一种自适应概念漂移的基于Hoeffding树在线Bagging分类算法。利用统计学理论,检验分类模型在自适应窗口内数据的分类精度是否落入真实错误率的单侧置信区间,由检测结果决定更新Hoeffding树或重建新Hoeffding树。实验结果表明,该算法在处理带有概念漂移的数据流上表现出较高的分类精度。
关键词:
数据流,
概念漂移,
Hoeffding 树,
在线Bagging
Abstract: Mining data streams require algorithms that make fast response, make light demands on memory resources and are easily to adapt to concept drift. This paper proposes a new algorithm for data streaming mining with concept drift called AHBag, which is based on Hoeffding tree online Bagging ensemble. The algorithm tests data within an adaptive window using the statistical theory for capturing the concept drift. According to the test results to update Hoeffding tree or rebuild a new Hoeffding trees. Experimental results show that the algorithm has a highly accuracy in dealing with data streams with concept drift.
Key words:
data stream,
concept drift,
Hoeffding tree,
online Bagging
中图分类号:
王黎明, 周驰. 自适应概念漂移的在线集成分类器[J]. 计算机工程, 2011, 37(5): 74-76.
WANG Li-Meng, ZHOU Chi. Online Ensemble Classifier for Adaptive Concept Drift[J]. Computer Engineering, 2011, 37(5): 74-76.