摘要: 在CAIM算法中,离散判别式仅考虑了区间中最多的类与属性间的依赖度,使离散化过度而导致结果不精确。基于此,提出对CAIM的改进算法,该算法考虑到按属性重要性从小到大顺序进行离散,同时根据粗糙集理论提出条件属性可分辨率概念,与近似精度同时控制信息表最终的离散程度,有效解决了离散化过度问题。实验通过C4.5和支持向量机分别对离散化后的数据进行识别和分类预测,结果证明了该算法的有效性。
关键词:
连续属性离散化,
粗糙集,
属性可分辨率
Abstract: In Class-Attribute Interdependency Maximization(CAIM) algorithm, discretization criterion only accounts for the trend of maximizing the number of values belonging to a leading class within each interval. The disadvantage makes CAIM generate irrational discrete results and further leads to the decrease of predictive accuracy of a classifier. This paper proposes a modified algorithm of CAIM. With the algorithm, the importance of attributes is adopted in discretization process, and a concept of attribute discernibility rate is proposed based on rough set. Both attribute discernibility rate and approximate quality are used for discretization intervals, which effectively resolve the problem of over-discretization. By using C4.5 and SVM, experiments are performed respectively with the results of discreted data, which show that the presented algorithm is effective.
Key words:
discretization of continuous attributes,
rough set,
attribute discernibility rate
中图分类号:
李 慧;闫德勤;张迎春. 一种改进的CAIM算法[J]. 计算机工程, 2010, 36(4): 77-78.
LI Hui; YAN De-qin; ZHANG Ying-chun. Modified Algorithm of CAIM[J]. Computer Engineering, 2010, 36(4): 77-78.