作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (8): 75-79,91. doi: 10.19678/j.issn.1000-3428.0051759

• 先进计算与数据处理 • 上一篇    下一篇

基于犹豫模糊决策树的非均衡数据分类

张旭, 周新志, 赵成萍, 邵伦   

  1. 四川大学 电子信息学院, 成都 610065
  • 收稿日期:2018-06-06 修回日期:2018-08-27 出版日期:2019-08-15 发布日期:2019-08-08
  • 作者简介:张旭(1992-),男,硕士研究生,主研方向为智能控制、数据挖掘;周新志,教授、博士;赵成萍,副教授、博士;邵伦,硕士研究生。
  • 基金资助:
    国家重点基础研究发展计划(2013CB328903-2)。

Unbalanced Data Classification Based on Hesitant Fuzzy Decision Tree

ZHANG Xu, ZHOU Xinzhi, ZHAO Chengping, SHAO Lun   

  1. College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
  • Received:2018-06-06 Revised:2018-08-27 Online:2019-08-15 Published:2019-08-08

摘要: 为优化针对非均衡数据的分类效果,结合犹豫模糊集理论与决策树算法,提出一种改进的模糊决策树算法。通过SMOTE算法对非均衡数据进行过采样处理,使用K-means聚类方法获得各属性的聚类中心点,利用2种不同的隶属度函数对数据集进行模糊化处理。在此基础上,根据隶属度函数和犹豫模糊集的信息能量求得各属性的犹豫模糊信息增益,选取最大值替代Fuzzy ID3算法中的模糊信息增益作为属性的分裂准则,构建一个用于非均衡数据分类的犹豫模糊决策树模型。实验结果表明,基于犹豫模糊决策树的分类器在AUC评价指标上相对于C4.5、KNN、随机森林等传统分类算法平均提高了12.6%。

关键词: 非均衡数据, 犹豫模糊集, 犹豫模糊决策树, K-means聚类, Fuzzy ID3算法

Abstract: In order to optimize the classification effect of unbalanced data,an improved fuzzy decision tree algorithm is proposed combining the hesitant fuzzy set theory and the decision tree algorithm.The unbalanced data is oversampled by the SMOTE algorithm,the cluster center point of each attribute is obtained by using the K-means clustering method,and the datasets is fuzzy processed by using two different membership functions.On this basis,the Hesitant Fuzzy Information Gain(HFIG) of each attribute is obtained by the information energy of hesitant fuzzy sets and membership functions.The largest HFIG is used to replace the FIG in the Fuzzy ID3 algorithm as the split criterion of the attribute,and a Hesitant Fuzzy Decision Tree(HFDT) model is constructed for unbalanced data classification.Experimental results show that,compared with traditional classification algorithms such as C4.5,KNN and random forest,the classifier based on HFDT has an average increase of 12.6% on the AUC evaluation index.

Key words: unbalanced data, hesitant fuzzy sets, Hesitant Fuzzy Decision Tree(HFDT), K-means clustering, Fuzzy ID3 algorithm

中图分类号: