计算机工程 ›› 2008, Vol. 34 ›› Issue (3): 9-11.doi: 10.3969/j.issn.1000-3428.2008.03.004

• 博士论文 • 上一篇    下一篇

一种新的基于粗糙集模型的决策树算法

高 静1,徐章艳1,2,宋 威1,杨炳儒1   

  1. (1. 北京科技大学信息工程学院,北京 100083;2. 广西师范大学计算机系,桂林 541004)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-02-05 发布日期:2008-02-05

New Decision Tree Algorithm Based on Rough Set Model

GAO Jing1, XU Zhang-yan1,2, SONG Wei1, YANG Bing-ru1   

  1. (1. School of Information Engineering, University of Science and Technology Beijing, Beijing 100083; 2. Department of Computer, Guangxi Normal University, Guilin 541004)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-02-05 Published:2008-02-05

摘要: 在基于粗糙集模型的决策树生成算法中,由于分类的精确性,导致生成算法在对实例进行划分时往往过于细化,无法避免少数特殊实例对决策树造成的不良影响,使得生成的决策树过于庞大,不便于理解,同时也降低了其对未来数据的分类和预测能力。针对上述问题,该文给出一个新的基于粗糙集模型的决策树生成算法,引入了抑制因子。对即将扩张的结点,除了常用的终止条件外,再加入一个终止条件:若样本的抑制因子大于给定的阈值,便不再扩展该结点。有效地避免了划分过细的问题,也不会生成过于庞大的决策树,便于用户理解。

关键词: 决策树, ID3算法, 粗糙集, 抑制因子, 上近似集, 下近似集

Abstract: Among the decision tree generation algorithms, which are based on rough set model, existing algorithms usually partition examples too detailedly to avoid the negative impact caused by a few special examples on decision tree because of the classification accuracy. This leads to that the generated decision tree seems too large to be understood. It also weakens its classification ability and predictable ability on data to be classified or predicted. In order to solve these problems, a new algorithm for generating decision tree based on rough set model is proposed. It introduces hold-down factor, which is an additional terminal condition for expanding nodes, besides traditional one. For generating node, if the hold-down factor of some sample is bigger than the given threshold, the node will not be expanded any more. Thus, the problem of too detailed partition is avoided. The size of decision tree generated by the proposed algorithm will not be too large to understand for the user.

Key words: decision tree, ID3 algorithm, rough set, hold-down factor, upper approximate set, lower approximate set

中图分类号: