基于CSD-ELM的不平衡数据分类算法

doi:10.19678/j.issn.1000-3428.0054988

计算机工程 ›› 2019, Vol. 45 ›› Issue (11): 54-61. doi: 10.19678/j.issn.1000-3428.0054988

基于CSD-ELM的不平衡数据分类算法

王大飞^a, 解武杰^b, 董文瀚^b

空军工程大学 a. 研究生院;b. 航空工程学院, 西安 710038

收稿日期:2019-05-22 修回日期:2019-06-25 发布日期:2019-06-29
作者简介:王大飞(1984-),男,硕士研究生,主研方向为数据挖掘、机器学习;解武杰、董文瀚,教授、博士。
基金资助:
航空科学基金（20141396012）。

Imbalanced Data Classification Algorithm Based on CSD-ELM

WANG Dafei^a, XIE Wujie^b, DONG Wenhan^b

a. Graduate School;b. College of Aeronautics Engineering, Air Force Engineering University, Xi'an 710038, China

Received:2019-05-22 Revised:2019-06-25 Published:2019-06-29

摘要/Abstract

摘要： 基于代价敏感学习的极限学习机（ELM）算法在处理不平衡数据分类问题时，未考虑不同类别样本的分布特点以及同一类别中各样本的重要性对分类结果的影响。为此，提出基于样本数量比例的错分惩罚因子设置方法，并基于Mini-batch k-means聚类与距离测度设计一种类内样本权值确定方案。在此基础上，构建区分正、负类别的隐含层输出矩阵，根据训练样本数与ELM隐含层节点数间的关系，分2种情况计算ELM隐含层与输出层间的连接权值，以降低算法的时间复杂度。实验结果表明，与ELM、WELM等算法相比，该算法的G-mean、F1分类性能指标值均较高。

关键词: 不平衡数据, 极限学习机, 代价敏感学习, Mini-batch k-means聚类, 约束优化理论

Abstract: The Extreme Learning Machine(ELM) based on cost-sensitive learning has its advantages in dealing with imbalanced data classification problems.However,it fails to consider the distribution characteristics of samples in different classes and the importance of each sample in the same class,both of which can have influence on the classification results.Therefore,we propose a setting method for misclassified penalty factor based on the proportion of sample size.Besides,based on Mini-batch k-means clustering and distance measure,we propose a determination method for the weights of samples in the same class.On this basis,we build the output matrix of the hidden layer to distinguish the positive and negative categories.According to the relationship between the size of training samples and the number of nodes in the ELM hidden layer,we calculate the connection weights between the hidden layer and the output layer of ELM in two conditions,thus reducing the time complexity of the algorithm.Experimental results show that compared with ELM,WELM and other algorithms,the proposed algorithm has higher G-mean and F1 classification performance index.

Key words: imbalanced data, Extreme Learning Machine(ELM), cost-sensitive learning, Mini-batch k-means clustering, constrained optimization theory

中图分类号:

TP181

王大飞, 解武杰, 董文瀚. 基于CSD-ELM的不平衡数据分类算法[J]. 计算机工程, 2019, 45(11): 54-61.

WANG Dafei, XIE Wujie, DONG Wenhan. Imbalanced Data Classification Algorithm Based on CSD-ELM[J]. Computer Engineering, 2019, 45(11): 54-61.

http://www.ecice06.com/CN/Y2019/V45/I11/54

图/表 5

20191118164515

20191118164517

20191118164520

20191118164523

20191118164527

参考文献

[1] HUANG Guangbin,ZHU Qinyu,SIEW C K.Extreme learning machine:theory and applications[J].Neurocomputing,2006,70:489-501.
[2] HUANG Guangbin,ZHOU Hongming,DING Xiaojian,et al.Extreme learning machine for regression and multiclass classification[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2012,42(2):513-529.
[3] 于化龙.不平衡学习:理论与算法[M].北京:清华大学出版社,2017.
[4] JANAKIRAMAN V M,NGUYEN X L,STERNIAK J,et al.Identification of the dynamic operating envelope of HCCI engines using class imbalance learning[J].IEEE Transactions on Neural Networks and Learning Systems,2015,26(1):98-112.
[5] JANAKIRAMAN V M,NGUYEN X L,ASSANIS D.Stochastic gradient based extreme learning machines for stable online learning of advanced combustion engines[J].Neurocomputing,2016,177:304-316.
[6] ELKAN C.The foundations of cost-sensitive learning[C]//Proceedings of the 17th International Joint Conference on Artificial Intelligence.Washington D.C.,USA:IEEE Press,2001:973-978.
[7] 赵永彬,陈硕,刘明,等.基于置信度代价敏感的支持向量机不均衡数据学习[J].计算机工程,2015,41(10):177-180.
[8] 陈博深.代价敏感的多分类恶意网页识别系统研究与实现[D].北京:北京邮电大学,2019.
[9] ZONG Weiwei,HUANG Guangbin,CHEN Yiqiang.Weighted extreme learning machine for imbalance learning[J].Neurocomputing,2013,101:229-242.
[10] 梅颖,卢诚波.面向不平衡数据流的自适应加权在线超限学习机算法[J].模式识别与人工智能,2019,32(2):144-150.
[11] WANG Yang,WANG Anna,AI Qing,et al.Ensemble based fuzzy weighted extreme learning machine for gene expression classification[J].Applied Intelligence,2019,49(3):1161-1171.
[12] 李军,后新燕.基于指数加权-核在线序列极限学习机的混沌系统动态重构研究[J].物理学报,2019,68(10):27-39.
[13] XIAO Wendong,ZHANG Jie,LI Yanjiao,et al.Class-specific cost regulation extreme learning machine for imbalanced classification[J].Neurocomputing,2017,261:70-82.
[14] GUO Yinan,ZHANG Pei,CUI Ning,et al.VPSO-based CCR-ELM for imbalanced classification[C]//Proceedings of the 9th International Conference on Swarm Intelligence.Berlin,Germany:Springer,2018:361-369.
[15] CHENG Jian,CHEN Jingjing,GUO Yinan,et al.Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning[J].Natural Computing,2019(2):1-12.
[16] SCULLEY D.Web-scale K-means clustering[C]//Proceedings of the 19th International Conference on World Wide Web.New York,USA:ACM Press,2010:1177-1178.
[17] FLETCHER R.Practical methods of optimization[M].New York,USA:Wiley,2013.
[18] SERRE D.Matrices:theory and applications[M].Berlin,Germany:Springer,2002.
[19] 周志华.机器学习[M].北京:清华大学出版社,2016.
[20] ALCALÁ FDEZ J,FERNÁNDEZ A,LUENGO J,et al.Keel data-mining software tool:data set repository,integration of algorithms and experimental analysis framework[J].Journal of Multiple-Valued Logic and Soft Computing,2011,17:255-287.

选择文件类型/文献管理软件名称

选择包含的内容

基于CSD-ELM的不平衡数据分类算法

Imbalanced Data Classification Algorithm Based on CSD-ELM

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184.
[2]	闫静, 张雪英, 李凤莲, 陈桂军, 黄丽霞. 结合栈式监督AE与可变加权ELM的回归预测模型[J]. 计算机工程, 2022, 48(8): 62-69,76.
[3]	生龙, 袁丽娜, 武南南, 姬少培. 基于GSA与DE优化混合核ELM的网络异常检测模型[J]. 计算机工程, 2022, 48(6): 146-153.
[4]	王萌铎, 续欣莹, 阎高伟, 史丽娟, 郭磊. 基于AdaBoost集成加权宽度学习系统的不平衡数据分类[J]. 计算机工程, 2022, 48(4): 99-105,112.
[5]	连卫芳, 晁浩, 刘永利. 基于SDAE与RELM的EEG情感识别方法[J]. 计算机工程, 2021, 47(9): 75-83.
[6]	李琦, 谢珺, 张喆, 董俊杰, 续欣莹. 基于多模态的在线序列极限学习机研究[J]. 计算机工程, 2021, 47(7): 67-73,80.
[7]	康璐璐, 范兴容, 王茜竹, 杨晓雅, 明蕊. 基于特征组分层与半监督学习的鼠标轨迹识别[J]. 计算机工程, 2021, 47(4): 277-284.
[8]	王俊红, 赵彬佳. 基于不平衡数据的特征选择算法研究[J]. 计算机工程, 2021, 47(11): 100-107.
[9]	陆荣秀, 何权恒, 杨辉, 朱建勇. 基于GA-ELM的稀土混合溶液多组分含量预测[J]. 计算机工程, 2021, 47(1): 284-290,297.
[10]	刘菲菲, 伍忠东, 丁龙斌, 张凯. 基于改进在线序列极限学习机的AMI入侵检测算法[J]. 计算机工程, 2020, 46(9): 136-142,148.
[11]	张国令, 王晓丹, 李睿, 来杰, 向前. 基于栈式降噪稀疏自编码器的极限学习机[J]. 计算机工程, 2020, 46(9): 61-67.
[12]	张杰, 沈苏彬. 一种适用于物联网的在线GP-ELM算法[J]. 计算机工程, 2020, 46(6): 314-320.
[13]	李廷顺, 王伟, 刘泽三. 考虑不确定区间的电力负荷GELM-WNN预测方法[J]. 计算机工程, 2019, 45(7): 315-320.
[14]	庞皓明,冀俊忠,刘金铎,姚垚. 基于流形正则化极限学习机的文本分类算法研究[J]. 计算机工程, 2019, 45(6): 242-248.
[15]	邵良杉, 兰亭洋, 李臣浩. 基于改进花朵授粉算法的极限学习机模型[J]. 计算机工程, 2019, 45(12): 281-288.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于CSD-ELM的不平衡数据分类算法

Imbalanced Data Classification Algorithm Based on CSD-ELM

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献

相关文章 15

编辑推荐

Metrics

本文评价