Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2020, Vol. 46 ›› Issue (8): 101-105. doi: 10.19678/j.issn.1000-3428.0055388

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Feature Selection Method Based on Maximum Information Coefficient and Redundancy Sharing

YUAN Zheming, YANG Jingjing, CHEN Yuan   

  1. Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-making, Hunan Agricultural University, Changsha 410128, China
  • Received:2019-07-04 Revised:2019-08-16 Published:2019-09-04

基于最大信息系数与冗余分摊的特征选择方法

袁哲明, 杨晶晶, 陈渊   

  1. 湖南农业大学 湖南省农业大数据分析与决策工程技术研究中心, 长沙 410128
  • 作者简介:袁哲明(1971-),男,教授、博士,主研方向为模式识别、机器学习;杨晶晶,硕士研究生;陈渊,讲师、博士。
  • 基金资助:
    国家自然科学基金(61701177);湖南省自然科学基金(2018JJ3225);湖南省作物种质创新与资源利用国家重点实验室培育基地开放课题(18KFXM08)。

Abstract: As a key step of machine learning,feature selection is usually implemented by using the minimal Redundancy Maximal Relevance(mRMR) method,but the method fails to compare the correlation measure and the redundancy measure,and cannot automatically terminate the introduction of features.To address the problems,this paper proposes a feature selection method(MIC-share) based on Maximum Information Coefficient(MIC) and redundancy allocation strategy.MIC is used to measure correlation and redundancy,and the redundancy allocation strategy is used to obtain new feature scores.So the process of feature introduction can be stopped automatically,and the time required to determine the optimal subset is reduced.Simulation results show that compared with PLSR,MIFS,KNN-FABC and other feature selection methods,the proposed method reduces the Root Mean Square(RMS) error of obtained regression data,and the error rate of classification data is also reduced.

Key words: feature selection, Maximum Information Coefficient(MIC), redundancy sharing, classification, Support Vector Machine(SVM), regression

摘要: 特征选择是机器学习的关键环节,通常采用最小冗余最大相关法进行特征选择,但该方法存在相关性测度与冗余性测度不可比、特征引入无法自动终止等问题。为此,提出一种基于最大信息系数(MIC)与冗余分摊策略的特征选择方法(MIC-share)。以MIC度量相关性测度与冗余性测度,采用冗余分摊策略获取新的特征得分,自动终止特征引入过程,减少最优子集确定所需时间。仿真结果表明,与PLSR、MIFS、KNN-FABC等特征选择方法相比,MIC-share方法得到的回归数据均方根误差更小,分类数据错误率更低。

关键词: 特征选择, 最大信息系数, 冗余分摊, 分类, 支持向量机, 回归

CLC Number: