基于犹豫模糊决策树的非均衡数据分类

doi:10.19678/j.issn.1000-3428.0051759

计算机工程 ›› 2019, Vol. 45 ›› Issue (8): 75-79,91. doi: 10.19678/j.issn.1000-3428.0051759

基于犹豫模糊决策树的非均衡数据分类

张旭, 周新志, 赵成萍, 邵伦

四川大学电子信息学院, 成都 610065

收稿日期:2018-06-06 修回日期:2018-08-27 出版日期:2019-08-15 发布日期:2019-08-08
作者简介:张旭(1992-),男,硕士研究生,主研方向为智能控制、数据挖掘;周新志,教授、博士;赵成萍,副教授、博士;邵伦,硕士研究生。
基金资助:
国家重点基础研究发展计划（2013CB328903-2）。

Unbalanced Data Classification Based on Hesitant Fuzzy Decision Tree

ZHANG Xu, ZHOU Xinzhi, ZHAO Chengping, SHAO Lun

College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China

Received:2018-06-06 Revised:2018-08-27 Online:2019-08-15 Published:2019-08-08

摘要/Abstract

摘要： 为优化针对非均衡数据的分类效果，结合犹豫模糊集理论与决策树算法，提出一种改进的模糊决策树算法。通过SMOTE算法对非均衡数据进行过采样处理，使用K-means聚类方法获得各属性的聚类中心点，利用2种不同的隶属度函数对数据集进行模糊化处理。在此基础上，根据隶属度函数和犹豫模糊集的信息能量求得各属性的犹豫模糊信息增益，选取最大值替代Fuzzy ID3算法中的模糊信息增益作为属性的分裂准则，构建一个用于非均衡数据分类的犹豫模糊决策树模型。实验结果表明，基于犹豫模糊决策树的分类器在AUC评价指标上相对于C4.5、KNN、随机森林等传统分类算法平均提高了12.6%。

关键词: 非均衡数据, 犹豫模糊集, 犹豫模糊决策树, K-means聚类, Fuzzy ID3算法

Abstract: In order to optimize the classification effect of unbalanced data,an improved fuzzy decision tree algorithm is proposed combining the hesitant fuzzy set theory and the decision tree algorithm.The unbalanced data is oversampled by the SMOTE algorithm,the cluster center point of each attribute is obtained by using the K-means clustering method,and the datasets is fuzzy processed by using two different membership functions.On this basis,the Hesitant Fuzzy Information Gain(HFIG) of each attribute is obtained by the information energy of hesitant fuzzy sets and membership functions.The largest HFIG is used to replace the FIG in the Fuzzy ID3 algorithm as the split criterion of the attribute,and a Hesitant Fuzzy Decision Tree(HFDT) model is constructed for unbalanced data classification.Experimental results show that,compared with traditional classification algorithms such as C4.5,KNN and random forest,the classifier based on HFDT has an average increase of 12.6% on the AUC evaluation index.

Key words: unbalanced data, hesitant fuzzy sets, Hesitant Fuzzy Decision Tree(HFDT), K-means clustering, Fuzzy ID3 algorithm

中图分类号:

TP181

张旭, 周新志, 赵成萍, 邵伦. 基于犹豫模糊决策树的非均衡数据分类[J]. 计算机工程, 2019, 45(8): 75-79,91.

ZHANG Xu, ZHOU Xinzhi, ZHAO Chengping, SHAO Lun. Unbalanced Data Classification Based on Hesitant Fuzzy Decision Tree[J]. Computer Engineering, 2019, 45(8): 75-79,91.

https://www.ecice06.com/CN/Y2019/V45/I8/75

参考文献 17

[1]	叶枫,丁峰.不平衡数据分类研究及其应用[J].计算机应用与软件,2018,35(1):132-136,205.
[2]	翟云,杨炳儒,曲武.不平衡类数据挖掘研究综述[J].计算机科学,2010,37(10):27-32.
[3]	CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[4]	HAN Hui,WANG Wenyuan,MAO Binghuan.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[J].Lecture Notes in Computer Science,2005,3644(5):878-887.
[5]	BATISTA G,PRATI R C,MONARD M C.A study of the behavior of several methods for balancing machine learning training data[J].SIGKDD Explorations,2004,6(1):20-29.
[6]	SUN Y,KAMEL M S,WONG A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378.
[7]	师彦文,王宏杰.基于新型不纯度度量的代价敏感随机森林分类器[J].计算机科学,2017,44(S2):98-101.
[8]	刘东启,陈志坚.适用于不平衡数据集分类的改进SVM算法[J].传感器与微系统,2018,37(3):1-4.
[9]	韩敏,朱新荣.不平衡数据分类的混合算法[J].控制理论与应用,2011,28(10):1485-1489.
[10]	李明方,张化祥.针对不平衡数据的Bagging的改进算法[J].计算机工程与应用,2013,49(2):40-42.
[11]	MEHDIZADEH M,EFTEKHARI M.Generating fuzzy rule base classifier for highly imbalanced datasets using a hybrid of evolutionary algorithms and subtractive clustering[J].Journal of Intelligent and Fuzzy Systems,2014,27(6):3033-3046.
[12]	HINOJOSA C E,CAMARGO H A,TÚPAC V Y J.Learning fuzzy classification rules from imbalanced datasets using multi-objective evolutionary algorithm[C]//Proceedings of Latin America Congress on Computational Intelligence.Washington D.C.,USA:IEEE Press,2015:1-6.
[13]	ZADEH L A.Fuzzy sets[J].Information Control,1965,8(3):338-353.
[14]	TORRA V.Hesitant fuzzy sets[J].International Journal of Intelligent Systems,2010,25(6):529-539.
[15]	CHEN Na,XU Zeshui,XIA Meimei.Correlation coeffi-cients of hesitant sets and their applications to clustering analysis[J].Applied Mathematical Modelling,2013,37(4):2197-2211.
[16]	UMANO M,OKAMOLO H,HATONO I,et al.Fuzzy decision trees by Fuzzy ID3 algorithm and its application to diagnosis system[C]//Proceedings of the 3rd IEEE International Conference on Fuzzy Systems.Washington D.C.,USA:IEEE Press,1994:2113-2118.
[17]	YUAN Yufei,SHAW M J.Induction of fuzzy decision trees[J].Fuzzy Sets and Systems,1995,69(2):125-139.

选择文件类型/文献管理软件名称

选择包含的内容

基于犹豫模糊决策树的非均衡数据分类

Unbalanced Data Classification Based on Hesitant Fuzzy Decision Tree

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献 17

相关文章 15

编辑推荐

Metrics

本文评价

[1]	刘宇航, 马慧芳, 刘海姣, 余丽. 一种可重叠子空间K-Means聚类算法[J]. 计算机工程, 2020, 46(8): 58-63,71.
[2]	徐慧君, 王忠, 马丽萍, 饶华, 何承恩. 改进Mini Batch K-Means时间权重推荐算法[J]. 计算机工程, 2020, 46(3): 73-78,86.
[3]	刘崇阳, 刘勤让. 基于LZW编码的卷积神经网络压缩方法[J]. 计算机工程, 2019, 45(9): 188-193.
[4]	陆贝妮,杜育根. 基于社区发现的Web服务QoS预测[J]. 计算机工程, 2019, 45(3): 117-124.
[5]	田学东,柴彦立,王海彬. 基于犹豫模糊特征的古籍汉字图像检索方法[J]. 计算机工程, 2019, 45(3): 217-224.
[6]	王大飞, 解武杰, 董文瀚. 基于CSD-ELM的不平衡数据分类算法[J]. 计算机工程, 2019, 45(11): 54-61.
[7]	周福星, 陈秀真, 马进, 李生红. 一种融合标签语义的微博热点话题挖掘方法[J]. 计算机工程, 2019, 45(10): 283-287.
[8]	谢永华,朱延刚,赵贤国. 基于Zernike矩与BoF-SURF特征融合的花粉图像分类识别[J]. 计算机工程, 2018, 44(7): 259-263,270.
[9]	孙梦颖,田学东. 线性代数式检索结果的相似度排序方法[J]. 计算机工程, 2018, 44(4): 253-261.
[10]	余乐,莫路锋,易晓梅. 一种路径损耗模型融合的WSN森林定位算法[J]. 计算机工程, 2018, 44(3): 87-92,98.
[11]	李江,袁修久,赵学军. 犹豫模糊事件的概率及犹豫模糊概率推理方法[J]. 计算机工程, 2018, 44(10): 182-189.
[12]	聂进焱,魏艳涛,瞿少成. 一种面向局部神经反应的模板选取算法[J]. 计算机工程, 2017, 43(3): 277-281.
[13]	田学东,张凯歌,周南,张植明,田冰洁. 一种数学表达式检索结果相关排序算法[J]. 计算机工程, 2017, 43(3): 204-212.
[14]	汪文靖,冯瑞. 基于二分K-means的测试用例集约简方法[J]. 计算机工程, 2016, 42(12): 73-77,83.
[15]	熊思，李磊民，黄玉清. 基于小波变换和K-means的非结构化道路检测[J]. 计算机工程, 2014, 40(2): 158-161.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于犹豫模糊决策树的非均衡数据分类

Unbalanced Data Classification Based on Hesitant Fuzzy Decision Tree

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献 17

相关文章 15

编辑推荐

Metrics

本文评价