作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (14): 32-34. doi: 10.3969/j.issn.1000-3428.2012.14.009

• 软件技术与数据库 • 上一篇    下一篇

CDC与REP结合的决策树剪枝优化算法

常 旭 a,李义杰 b,刘万军 b   

  1. (辽宁工程技术大学 a. 研究生学院;b. 软件学院,辽宁 葫芦岛 125105)
  • 收稿日期:2011-09-23 出版日期:2012-07-20 发布日期:2012-07-20
  • 作者简介:常 旭(1987-),女,硕士研究生,主研方向:数据挖掘,数据库技术;李义杰、刘万军,教授
  • 基金资助:
    国家自然科学基金资助项目(61172144);辽宁省教育厅 基金资助项目(2009A350)

Decision Tree Pruning Optimization Algorithm of CDC and REP Combination

CHANG Xu a, LI Yi-jie b, LIU Wan-jun b   

  1. (a. Institute of Graduate; b. School of Software, Liaoning Technical University, Huludao 125105, China)
  • Received:2011-09-23 Online:2012-07-20 Published:2012-07-20

摘要: 为改善剪枝算法单一的事前剪枝或事后剪枝导致分类响应时间长、准确度低的问题,在REP事后剪枝的基础上,提出一种CDC与REP结合的决策树剪枝优化算法。使用CDC算法在生成决策树的同时,利用左右子树节点差异比来排除部分非叶子节点,决策树生成后再通过REP算法对决策树进一步剪枝。实验结果表明,该算法可避免庞大决策树的生成过程过于细化导致过于拟合的现象,与其他算法相比,能减少分裂时间,提高决策树分裂的正确率。

关键词: 决策树, 枝, CDC算法, REP算法, 叶子节点

Abstract: Combined with the Child Difference Choose(CDC) method, a new method comes out based on the Reduced Error Pruning(REP) after pruning method, which improves the situation of longer time and lower accuracy result from the single pattern of pruning ways. CDC is to generate a decision tree while taking advantage of differences between left and right sub-tree nodes to exclude some non-leaf node. And makes a further pruning to the decision tree approaching REP method after generating a decision tree. Experimental results show that the method avoids the phenomenon of over-fitting because that the decision tree is too detailed, and compared with other methods, it greatly reduces the time of split. At the same time, approaching the further pruning, it is proved to have a high accuracy again.

Key words: decision tree, pruning, Child Difference Choose(CDC) algorithm, Reduced Error Pruning(REP) algorithm, leaf node

中图分类号: