结合流形学习与逻辑回归的多标签特征选择

doi:10.19678/j.issn.1000-3428.0060333

摘要/Abstract

摘要： 对于多标签特征选择算法，通常假设数据与标签间呈现某种关系，以该关系为基础并通过正则项的约束可解决多标签特征选择问题，但该关系也可能是两种或多种关系的结合。为准确描述数据与标签间的关系并去除不相关的特征和冗余特征，基于logistic回归模型与标签流形结构提出多标签特征选择算法FSML。使用logistic回归模型的损失函数学习回归系数矩阵，利用标签流形结构学习数据特征的权重矩阵，通过L_2，₁-范数将系数矩阵和权重矩阵进行柔性结合，约束系数矩阵与权重矩阵的稀疏性并实现多标签特征选择。在经典多标签数据集上的实验结果表明，与CMLS、SCLS等特征选择算法相比，FSML算法在汉明损失、排名损失、1-错误率、覆盖率、平均精度等5个性能评价指标上表现良好，能更准确地描述数据与标签间的关系。

关键词: 多标签学习, 特征选择, logistic回归, L_2,1-范数, 流形结构

Abstract: The multi-label feature selection algorithms are usually based on an assumption that there is a certain relationship between data and labels.Based on this relationship and through the constraints of regular terms, the multi-label feature selection problem can be solved.The relationship may also be the composition of two or more relationships.To describe the relationship more accurately, a multi-tag feature selection algorithm named FSML is proposed based on the logistic regression model and label manifold structure.We use the loss function of the logistic regression model to learn the regression coefficient matrix, and use the label manifold structure to learn the weight matrix of data features.Then we employ the L₂_{, 1}-norm to combine the coefficient matrix and weight matrix flexibly to constrain the sparsity of the matrices and realize multi-label feature selection.The experimental results on the classic multi-label datasets show that compared with feature selection algorithms such as CMLS and SCLS, the FSML algorithm performs well on five performance evaluation indicators, including Hamming Loss, Ranking Loss, One-error, Coverage, and Average Precision.It can more accurately describe the relationship between data and labels.

Key words: multi-label learning, feature selection, logistic regression, L_2,1-norm, manifold structure

中图分类号:

TP181

张要, 马盈仓, 朱恒东, 李恒, 陈程. 结合流形学习与逻辑回归的多标签特征选择[J]. 计算机工程, 2022, 48(3): 90-99,106.

ZHANG Yao, MA Yingcang, ZHU Hengdong, LI Heng, CHEN Cheng. Multi-label Feature Selection Combining Manifold Learning and Logistic Regression[J]. Computer Engineering, 2022, 48(3): 90-99,106.

http://www.ecice06.com/CN/Y2022/V48/I3/90

图/表 9

20220330200921

20220330200924

20220330200928

20220330200932

20220330200935

20220330200938

20220330200941

20220330200945

20220330200948

参考文献

[1] BERMINGHAM M L, PONG-WONG R, SPILIOPOULOU A, et al.Application of high-dimensional feature selection:evaluation for genomic prediction in man[J].Scientific Reports, 2015, 5:10312.
[2] FRANKLIN J.The elements of statistical learning:data mining, inference and prediction[J].The Mathematical Intelligencer, 2005, 27(2):83-85.
[3] SUN X, LIU Y H, LI J, et al.Using cooperative game theory to optimize the feature selection problem[J].Neurocomputing, 2012, 97:86-93.
[4] KONG X N, YU P S.gMLC:a multi-label feature selection framework for graph classification[J].Knowledge and Information Systems, 2012, 31(2):281-305.
[5] ZHANG R, NIE F P, LI X L, et al.Feature selection with multi-view data:a survey[J].Information Fusion, 2019, 50:158-167.
[6] LI Q, XIE B, YOU J, et al.Correlated logistic model with elastic net regularization for multilabel image classification[J].IEEE Transactions on Image Processing, 2016, 25(8):3801-3813.
[7] SATO T, TAKANO Y, MIYASHIRO R, et al.Feature subset selection for logistic regression via mixed integer optimization[J].Computational Optimization and Applications, 2016, 64(3):865-880.
[8] YANG Z Y, LIANG Y, ZHANG H, et al.Robust sparse logistic regression with the L_q(0< q < 1) regularization for feature selection using gene expression data[J].IEEE Access, 2018, 6:68586-68595.
[9] SHI J B, MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
[10] ROWEIS S T, SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science, 2000, 290(5500):2323-2326.
[11] KANG Z, PENG C, CHENG Q, et al.Structured graph learning for clustering and semi-supervised classification[J].Pattern Recognition, 2021, 110:107627.
[12] KANG Z, PAN H Q, HOI S C H, et al.Robust graph learning from noisy data[J].IEEE Transactions on Cybernetics, 2020, 50(5):1833-1843.
[13] 周婉莹, 马盈仓, 郑毅, 等.稀疏回归和流形学习的无监督特征选择算法[J].计算机应用研究, 2020, 37(9):2634-2639. ZHOU W Y, MA Y C, ZHENG Y, et al.Unsupervised feature selection algorithm based on sparse regression and manifold learning[J].Application Research of Computers, 2020, 37(9):2634-2639.(in Chinese)
[14] 黄天意, 祝峰.基于流形学习的代价敏感特征选择[J].山东大学学报(理学版), 2017, 52(3):91-96. HUANG T Y, ZHU F.Cost-sensitive feature selection via manifold learning[J].Journal of Shandong University(Natural Science), 2017, 52(3):91-96.(in Chinese)
[15] TANG B G, ZHANG L.Local preserving logistic I-Relief for semi-supervised feature selection[J].Neurocomputing, 2020, 399:48-64.
[16] LIU H W, ZHANG S C, WU X D.MLSLR:multilabel learning via sparse logistic regression[J].Information Sciences, 2014, 281:310-320.
[17] GU Q Q, ZHOU J.Co-clustering on manifolds[C]//Proceedings of 2009 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2009:359-368.
[18] BELKIN M, NIYOGI P.Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.New York, USA:ACM Press, 2001:585-591.
[19] NIE F P, HUANG H, CAI X, et al.Efficient and robust feature selection via joint L2, 1-norms minimization[EB/OL].[2020-11-05].https://blog.csdn.net/taylent/article/details/105352427.
[20] HE R, TAN T N, WANG L, et al.L_{2, 1} regularized correntropy for robust feature selection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2012:2504-2511.
[21] LEE J, KIM D W.SCLS:multi-label feature selection based on scalable criterion for large label set[J].Pattern Recognition, 2017, 66:342-352.
[22] LIN Y J, HU Q H, LIU J H, et al.Multi-label feature selection based on max-dependency and Min-redundancy[J].Neurocomputing, 2015, 168:92-103.
[23] LEE J, KIM D W.Feature selection for multi-label classification using multivariate mutual information[J].Pattern Recognition Letters, 2013, 34(3):349-357.
[24] LEE J, KIM D W.Fast multi-label feature selection based on information-theoretic feature ranking[J].Pattern Recognition, 2015, 48(9):2761-2771.
[25] 陈红, 杨小飞, 万青, 等.基于相关熵和流形学习的多标签特征选择算法[J].山东大学学报(工学版), 2018, 48(6):27-36. CHEN H, YANG X F, WAN Q, et al.Multi-label feature selection algorithm based on correntropy and manifold learning[J].Journal of Shandong University(Engineering Science), 2018, 48(6):27-36.(in Chinese)
[26] ZHANG M L, ZHOU Z H.ML-KNN:a lazy learning approach to multi-label learning[J].Pattern Recognition, 2007, 40(7):2038-2048.
[27] DOUGHERTY J, KOHAVI R, SAHAMI M.Supervised and unsupervised discretization of continuous features[C]//Proceedings of the 12th International Conference on Machine Learning.Berlin, Germany:Springer, 1995:194-202.
[28] DEMIAR J, SCHUURMANS D.Statistical comparisons of classifiers over multiple data sets[J].Journal of Machine Learning Research, 2006, 7(1):1-30.

选择文件类型/文献管理软件名称

选择包含的内容