Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2008, Vol. 34 ›› Issue (4): 111-112. doi: 10.3969/j.issn.1000-3428.2008.04.038

• Software Technology and Database • Previous Articles     Next Articles

Discretization Algorithm of Continuous Attributes Based on Cramer’s V

GUO Qi-ming, FAN Wei   

  1. (Software Technology Research Center, Civil Aviation University of China, Tianjin 300300)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-02-20 Published:2008-02-20

基于Cramer’s V的连续属性离散化算法

郭启铭,樊 玮   

  1. (中国民航大学软件技术研究中心,天津 300300)

Abstract: On the basis of class-attribute dependence, this paper proposes a new algorithm based on Cramer’s V, called CVM. The method makes use of Cramer’s V in statistics to measure the correlation between classes and the discretized attributes and obtain the maximum correlation. The algorithm is compared with CADD algorithm and CAIM algorithm. Results show CVM algorithm is more effective. The accuracy of classification prediction by C4.5 is higher for data discretized by CVM algorithm.

Key words: continuous attribute, discretization, classification

摘要: 在类-属性相关离散化方法的基础上,提出一种基于Cramer’s V的连续属性离散化算法CVM,该方法利用统计学中的Cramer’s V来量化类-属性相关度,以保证离散后的类-属性相关度最大。与CADD和CAIM算法的实验比较以及对离散化后的数据进行C4.5分类测试,表明CVM算法性能良好,其离散化的数据明显地提高了分类器的预测精度。

关键词: 连续属性, 离散化, 分类

CLC Number: