摘要： 基于代价敏感学习的极限学习机（ELM）算法在处理不平衡数据分类问题时，未考虑不同类别样本的分布特点以及同一类别中各样本的重要性对分类结果的影响。为此，提出基于样本数量比例的错分惩罚因子设置方法，并基于Mini-batch k-means聚类与距离测度设计一种类内样本权值确定方案。在此基础上，构建区分正、负类别的隐含层输出矩阵，根据训练样本数与ELM隐含层节点数间的关系，分2种情况计算ELM隐含层与输出层间的连接权值，以降低算法的时间复杂度。实验结果表明，与ELM、WELM等算法相比，该算法的G-mean、F1分类性能指标值均较高。
Abstract: The Extreme Learning Machine(ELM) based on cost-sensitive learning has its advantages in dealing with imbalanced data classification problems.However,it fails to consider the distribution characteristics of samples in different classes and the importance of each sample in the same class,both of which can have influence on the classification results.Therefore,we propose a setting method for misclassified penalty factor based on the proportion of sample size.Besides,based on Mini-batch k-means clustering and distance measure,we propose a determination method for the weights of samples in the same class.On this basis,we build the output matrix of the hidden layer to distinguish the positive and negative categories.According to the relationship between the size of training samples and the number of nodes in the ELM hidden layer,we calculate the connection weights between the hidden layer and the output layer of ELM in two conditions,thus reducing the time complexity of the algorithm.Experimental results show that compared with ELM,WELM and other algorithms,the proposed algorithm has higher G-mean and F1 classification performance index.
Extreme Learning Machine(ELM),
Mini-batch k-means clustering,
constrained optimization theory