作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (4): 48-51. doi: 10.3969/j.issn.1000-3428.2013.04.012

• 先进计算与数据处理 • 上一篇    下一篇

基于CACC的连续数据离散化改进算法

刘小龙1,江 虹1,吴 丹2   

  1. (1. 西南科技大学信息工程学院,四川 绵阳 621010;2. 四川航天职业技术学院,成都 610100)
  • 收稿日期:2012-07-04 出版日期:2013-04-15 发布日期:2013-04-12
  • 作者简介:刘小龙(1987-),男,硕士研究生,主研方向:智能算法,数据挖掘;江 虹,教授、博士;吴 丹,助教
  • 基金资助:
    国家自然科学基金资助项目“认知无线电智能学习与决策关键技术研究”(61072138)

Improved Algorithm Based on CACC for Discretization of Continuous Data

LIU Xiao-long 1, JIANG Hong 1, WU Dan 2   

  1. (1. School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China; 2. Sichuan Aerospace Polytechnic, Chengdu 610100, China)
  • Received:2012-07-04 Online:2013-04-15 Published:2013-04-12

摘要: 针对粗糙集及主要机器学习算法一般都无法高效处理连续数据的问题,提出一种基于CACC的连续数据离散化的改进算法。该算法采用CACC标准选取断点,通过增加数据不一致率约束条件,从而减少数据丢失信息量。仿真结果表明,CACC改进算法与Modified Chi2、Extent-Chi2、CAIM、CACC算法相比,并通过C4.5和SVM算法验证,数据识别率和精度可提高近8%。

关键词: 粗糙集, 离散化, 重要属性, 不一致率, CACC改进算法, 精度

Abstract: Aiming at the problem that rough set and the main machine learning algorithms can not efficiently handle continuous data, this paper presents an improved CACC algorithm for discretization of the continuous data. This algorithm adopts the CACC standard to select breakpoints to increase constraints on data inconsistency, thereby reducing the amount of information loss. Simulation results show that the algorithm outperforms the corresponding algorithms, such as Modified Chi2, Extent-Chi2, CAIM, CACC, through the C4.5 and SVM algorithm validation, the maximum amplitude of data recognition rate and accuracy is increased by 8%.

Key words: rough set, discretization, important attributes, inconsistency, CACC improved algorithm, accuracy

中图分类号: