作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (10): 222-224. doi: 10.3969/j.issn.1000-3428.2007.10.080

• 人工智能及识别技术 • 上一篇    下一篇

决策树分类准确率极限的研究

牛 琨1,陈俊亮1,张舒博2   

  1. (1. 北京邮电大学计算机科学与技术学院,北京100876;2. 中国电信北京研究院决策研究部,北京100035)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-05-20 发布日期:2007-05-20

Research on Classification Accuracy Limit of Decision Tree

NIU Kun1, CHEN Junliang1, ZHANG Shubo2   

  1. (1. School of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876; 2. Department of Strategy Research, Beijing Research Institute of China Telecom., Beijing 100035)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-05-20 Published:2007-05-20

摘要: 采用最大分类树作为分析经验风险与结构风险的工具,对决策树分类准确率极限进行了研究。针对决策树模型的分类效果难以客观评价的问题,讨论了决策树分类准确率极限的存在条件,给出了求出该极限的方法。以最大分类树作为分析工具,提出了在经验风险和结构风险4种分布条件下分类准确率极限是否存在的4个定理,并从机器学习理论和工程建模实践2个角度进行了讨论。实验验证了该理论的正确性。

关键词: 决策树, 分类准确率, 极限, 经验风险, 结构风险

Abstract: Taking maximum classification tree as a tool to analyze empirical risk and structural risk, this paper addresses the problem of classification accuracy limit of decision tree. Aiming at the difficulty to estimate the classification effectiveness of decision tree externally, it discusses the existence condition of classification accuracy limit and presents the method to get it. It points out four theorems which demonstrate the existence of classification accuracy limit under four distribution conditions of empirical risk and structural risk with analysis from machine learning theory and practical modeling. The theorems are validated from experiments on ten public datasets.

Key words: Decision tree, Classification accuracy, Limit, Empirical risk, Structural risk

中图分类号: