Abstract:
Taking maximum classification tree as a tool to analyze empirical risk and structural risk, this paper addresses the problem of classification accuracy limit of decision tree. Aiming at the difficulty to estimate the classification effectiveness of decision tree externally, it discusses the existence condition of classification accuracy limit and presents the method to get it. It points out four theorems which demonstrate the existence of classification accuracy limit under four distribution conditions of empirical risk and structural risk with analysis from machine learning theory and practical modeling. The theorems are validated from experiments on ten public datasets.
Key words:
Decision tree,
Classification accuracy,
Limit,
Empirical risk,
Structural risk
摘要: 采用最大分类树作为分析经验风险与结构风险的工具,对决策树分类准确率极限进行了研究。针对决策树模型的分类效果难以客观评价的问题,讨论了决策树分类准确率极限的存在条件,给出了求出该极限的方法。以最大分类树作为分析工具,提出了在经验风险和结构风险4种分布条件下分类准确率极限是否存在的4个定理,并从机器学习理论和工程建模实践2个角度进行了讨论。实验验证了该理论的正确性。
关键词:
决策树,
分类准确率,
极限,
经验风险,
结构风险
CLC Number:
NIU Kun; CHEN Junliang; ZHANG Shubo. Research on Classification Accuracy Limit of Decision Tree[J]. Computer Engineering, 2007, 33(10): 222-224.
牛 琨;陈俊亮;张舒博. 决策树分类准确率极限的研究[J]. 计算机工程, 2007, 33(10): 222-224.