作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (6): 35-37. doi: 10.3969/j.issn.1000-3428.2008.06.012

• 博士论文 • 上一篇    下一篇

集成学习算法的差异性及性能比较

李 凯1,崔丽娟2   

  1. (1. 河北大学数学与计算机学院,保定 071002;2. 河北大学图书馆,保定 071002)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-03-20 发布日期:2008-03-20

Diversity and Performance Comparison forEnsemble Learning Algorithms

LI Kai1, CUI Li-juan2   

  1. (1. School of Mathematics and Computer, Hebei University, Baoding 071002; 2. Library of Hebei University, Baoding 071002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-03-20 Published:2008-03-20

摘要: 从差异性出发,研究了基于特征集技术(通过一定的策略选取不同特征集以组成训练集)与数据技术(通过取样技术选取不同的训练集)的集成学习算法,分析了两种集成学习算法产生差异性的方法。针对决策树与神经网络模型,在标准数据集中对集成学习算法的性能进行实验研究,结果表明集成学习算法的性能依赖于数据集的特性以及产生差异性的方法等因素。从总体性能考虑,基于数据的集成学习算法在大多数数据集上优于基于特征集的集成学习算法。

关键词: 差异性, 集成学习, 特征集, 取样, 性能

Abstract: From point of view of diversity, the paper studies ensemble learning algorithms based on feature sets and data. Methods of creating diversity for these ensemble learning algorithms are analyzed. And experimental studies for using decision trees and neural networks as basis models are conducted on 10 standard data sets. The results show that performances of ensemble learning algorithms depend on character of data sets, method of creating diversity, and etc. In general, performances of ensemble learning algorithms based on data are superior to one based on feature sets.

Key words: diversity, ensemble learning, feature set, sampling, performance

中图分类号: