作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (3): 90-99,106. doi: 10.19678/j.issn.1000-3428.0060333

• 人工智能与模式识别 • 上一篇    下一篇

结合流形学习与逻辑回归的多标签特征选择

张要, 马盈仓, 朱恒东, 李恒, 陈程   

  1. 西安工程大学 理学院, 西安 710600
  • 收稿日期:2020-12-21 修回日期:2021-01-25 发布日期:2021-02-24
  • 作者简介:张要(1994-),男,硕士研究生,主研方向为机器学习与聚类;马盈仓(通信作者),教授、博士;朱恒东、李恒、陈程,硕士研究生。
  • 基金资助:
    国家自然科学基金(61976130);陕西省重点研发计划(2018KW-021);陕西省自然科学基金(2020JQ-923)。

Multi-label Feature Selection Combining Manifold Learning and Logistic Regression

ZHANG Yao, MA Yingcang, ZHU Hengdong, LI Heng, CHEN Cheng   

  1. School of Science, Xi'an Polytechnic University, Xi'an 710600, China
  • Received:2020-12-21 Revised:2021-01-25 Published:2021-02-24

摘要: 对于多标签特征选择算法,通常假设数据与标签间呈现某种关系,以该关系为基础并通过正则项的约束可解决多标签特征选择问题,但该关系也可能是两种或多种关系的结合。为准确描述数据与标签间的关系并去除不相关的特征和冗余特征,基于logistic回归模型与标签流形结构提出多标签特征选择算法FSML。使用logistic回归模型的损失函数学习回归系数矩阵,利用标签流形结构学习数据特征的权重矩阵,通过L2,1-范数将系数矩阵和权重矩阵进行柔性结合,约束系数矩阵与权重矩阵的稀疏性并实现多标签特征选择。在经典多标签数据集上的实验结果表明,与CMLS、SCLS等特征选择算法相比,FSML算法在汉明损失、排名损失、1-错误率、覆盖率、平均精度等5个性能评价指标上表现良好,能更准确地描述数据与标签间的关系。

关键词: 多标签学习, 特征选择, logistic回归, L2,1-范数, 流形结构

Abstract: The multi-label feature selection algorithms are usually based on an assumption that there is a certain relationship between data and labels.Based on this relationship and through the constraints of regular terms, the multi-label feature selection problem can be solved.The relationship may also be the composition of two or more relationships.To describe the relationship more accurately, a multi-tag feature selection algorithm named FSML is proposed based on the logistic regression model and label manifold structure.We use the loss function of the logistic regression model to learn the regression coefficient matrix, and use the label manifold structure to learn the weight matrix of data features.Then we employ the L2, 1-norm to combine the coefficient matrix and weight matrix flexibly to constrain the sparsity of the matrices and realize multi-label feature selection.The experimental results on the classic multi-label datasets show that compared with feature selection algorithms such as CMLS and SCLS, the FSML algorithm performs well on five performance evaluation indicators, including Hamming Loss, Ranking Loss, One-error, Coverage, and Average Precision.It can more accurately describe the relationship between data and labels.

Key words: multi-label learning, feature selection, logistic regression, L2,1-norm, manifold structure

中图分类号: