计算机工程 ›› 2008, Vol. 34 ›› Issue (4): 111-112.doi: 10.3969/j.issn.1000-3428.2008.04.038

• 软件技术与数据库 • 上一篇    下一篇

基于Cramer’s V的连续属性离散化算法

郭启铭,樊 玮   

  1. (中国民航大学软件技术研究中心,天津 300300)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-02-20 发布日期:2008-02-20

Discretization Algorithm of Continuous Attributes Based on Cramer’s V

GUO Qi-ming, FAN Wei   

  1. (Software Technology Research Center, Civil Aviation University of China, Tianjin 300300)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-02-20 Published:2008-02-20

摘要: 在类-属性相关离散化方法的基础上,提出一种基于Cramer’s V的连续属性离散化算法CVM,该方法利用统计学中的Cramer’s V来量化类-属性相关度,以保证离散后的类-属性相关度最大。与CADD和CAIM算法的实验比较以及对离散化后的数据进行C4.5分类测试,表明CVM算法性能良好,其离散化的数据明显地提高了分类器的预测精度。

关键词: 连续属性, 离散化, 分类

Abstract: On the basis of class-attribute dependence, this paper proposes a new algorithm based on Cramer’s V, called CVM. The method makes use of Cramer’s V in statistics to measure the correlation between classes and the discretized attributes and obtain the maximum correlation. The algorithm is compared with CADD algorithm and CAIM algorithm. Results show CVM algorithm is more effective. The accuracy of classification prediction by C4.5 is higher for data discretized by CVM algorithm.

Key words: continuous attribute, discretization, classification

中图分类号: