基于文本分类的Fisher Score快速多标记特征选择算法

doi:10.19678/j.issn.1000-3428.0060594

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 113-124. doi: 10.19678/j.issn.1000-3428.0060594

基于文本分类的Fisher Score快速多标记特征选择算法

汪正凯¹, 沈东升², 王晨曦²

1. 福建省粒计算及其应用重点实验室, 福建漳州 363000;
2. 闽南师范大学计算机学院, 福建漳州 363000

收稿日期:2021-01-14 修回日期:2021-02-23 发布日期:2021-02-26
作者简介:汪正凯(1995-),男,硕士研究生,主研方向为多标记学习、机器学习;沈东升、王晨曦,副教授、硕士。
基金资助:
福建省自然科学基金（2020J01811）。

Fisher Score Fast Multi-Label Feature Selection Algorithm Based on Text Classification

WANG Zhengkai¹, SHEN Dongsheng², WANG Chenxi²

1. Fujian Key Laboratory of Granular Computing and Application, Zhangzhou, Fujian 363000, China;
2. College of Computer, Minnan Normal University, Zhangzhou, Fujian 363000, China

Received:2021-01-14 Revised:2021-02-23 Published:2021-02-26

摘要/Abstract

摘要： Fisher Score （FS）是一种快速高效的评价特征分类能力的指标，但传统的FS指标既无法直接应用于多标记学习，也不能有效处理样本极值导致的类中心与实际类中心的误差。提出一种结合中心偏移和多标记集合关联性的FS多标记特征选择算法，找出不同标记下每类样本的极值点，以极值点到该类样本的中心距离乘以半径系数筛选新的样本，从而获得分布更为密集的样本集合，以此计算特征的FS得分，通过整体遍历全体样本的标记集合中的每个标记，并在遍历过程中针对具有更多标记数量的样本自适应地赋以标记权值，得到整体特征的平均FS得分，以特征的FS得分进行排序过滤出目标子集实现特征选择目标。在8个公开的多标记文本数据集上进行参数分析及5种指标性能比较，结果表明，该算法具有一定的有效性和鲁棒性，在多数指标上优于MLNB、MLRF、PMU、MLACO等多标记特征选择算法。

关键词: 多标记分类, 特征选择, Fisher Score指标, 距离度量, 类间散度

Abstract: Fisher Score(FS) is a fast and efficient indicator to evaluate feature classification performance.However, the traditional FS indicator can not be directly applied to multi-label learning, nor effectively deal with the error between the class center and the actual class center caused by the sample extreme value.This paper proposes a FS-based multi-label feature selection algorithm that combines centroid shift and multi-label set association.The algorithm finds out the extremum points of each class of samples under different labels, and then multiplies the radius coefficient and the distance from extremum point to center of the class of samples, so as to obtain a more densely distributed sample set.On this basis, the FS of the features is calculated.Then the algorithm traverses each label in the label set of all samples.For those samples with multiple labels, the algorithm adaptively weights the labels in the process of traversal, and thus obtains the average FS of all features.Then the scores are sorted out to filter out the target subset to achieve the goal of feature selection.The proposed algorithm is tested on 8 public multi-label text datasets for parameter analysis, and compared with other algorithms in terms of 5 performance indicators.Results show that the proposed algorithm displays certain effectiveness and robustness, and outperforms MLNB, MLRF, PMU, MLACO and other multi-label feature selection algorithms on most of the indicators.

Key words: multi-label classification, feature selection, Fisher Score(FS) index, distance measure, inter-class divergence

中图分类号:

TP391

汪正凯, 沈东升, 王晨曦. 基于文本分类的Fisher Score快速多标记特征选择算法[J]. 计算机工程, 2022, 48(2): 113-124.

WANG Zhengkai, SHEN Dongsheng, WANG Chenxi. Fisher Score Fast Multi-Label Feature Selection Algorithm Based on Text Classification[J]. Computer Engineering, 2022, 48(2): 113-124.

http://www.ecice06.com/CN/Y2022/V48/I2/113

图/表 18

20220228182854

20220228182858

20220228182901

20220228182905

20220228182909

20220228182912

20220228182916

20220228182919

20220228182923

20220228182926

20220228182930

20220228182934

20220228182937

20220228182941

20220228182946

20220228182950

20220228182955

20220228182959

参考文献

[1] DING W, LIN C T, CAO Z.Deep neuro-cognitive co-evolution for fuzzy attribute reduction by quantum leaping PSO with nearest-neighbor memeplexes[J].IEEE Transactions on Cybernetics, 2019, 49(7):2744-2757.
[2] GIBAJA E, VENTURA S.A tutorial on multilabel learning[J].ACM Computing Surveys, 2015, 47(3):1-38.
[3] KASHEF S, NEZAMABADI-POUR H.A label-specific multi-label feature selection algorithm based on the Pareto dominance concept[J].Pattern Recognition, 2019, 88:654-667.
[4] LIU H, LI X, ZHANG S.Learning instance correlation functions for multilabel classification[J].IEEE Transactions on Cybernetics, 2017, 47(2):499-510.
[5] CHE X Y, CHEN D G, MI J S.A novel approach for learning label correlation with application to feature selection of multi-label data[J].Information Sciences, 2020, 512(8):795-812.
[6] HUANG M M, SUN L, XU J C, et al.Multilabel feature selection using relief and minimum redundancy maximum relevance based on neighborhood rough sets[J].IEEE Access, 2020, 8:62011-62031.
[7] CHEN S B, ZHANG Y M, DING Q C H, et al.Extended adaptive lasso for multi-class and multi-label feature selection[J].Knowledge-Based Systems, 2019, 173:28-36.
[8] SPOLAÔR N, CHERMAN E A, MONARD M C, et al.Relief for multi-label feature selection[C]//Proceedings of 2013 Brazilian Conference on Intelligent Systems.Fortaleza, Brazil:[s.n.], 2014:19-24.
[9] LEE J, KIM D W.Feature selection for multi-label classification using multivariate mutual information[J].Pattern Recognition Letters, 2013, 34(3):349-357.
[10] ZHANG M L, PENA J M, ROBLES V.Feature selection for multi-label naive Bayes classification[J].Information Sciences, 2009, 179(19):3218-3229.
[11] LI L, LIU H W, MA Z J, et al.Multi-label feature selection via information gain[C]//Proceedings of International Conference on Advanced Data Mining and Applications.Washington D.C., USA:IEEE Press, 2014:345-355.
[12] LIN Y J, HU Q H, LIU J H, et al.Multi-label feature selection based on max_dependency and min_redundancy[J].Neurocomputing, 2015, 168:92-103.
[13] 姚二亮, 李德玉, 李艳红, 等.基于双空间模糊辨识关系的多标记特征选择[J].模式识别与人工智能, 2019, 32(8):709-717. YAO E L, LI D Y, LI Y H, et al.Multi-label feature selection based on fuzzy discernibility relations in double spaces[J].Pattern Recognition and Artificial Intelligence, 2019, 32(8):709-717.(in Chinese)
[14] LIN Y J, LIN Y W, WANG C X, et al.Attribute reduction for multi-label learning with fuzzy rough set[J].Knowledge-based systems, 2018, 152:51-61.
[15] 谢娟英, 王春霞, 蒋帅, 等.基于改进的F-score与支持向量机的特征选择方法[J].计算机应用, 2010, 30(4):993-996. XIE J Y, WANG C X, JIANG S, et al.Feature selection method combing improved F-score and support vector machine[J].Journal of Computer Applications, 2010, 30(4):993-996.(in Chinese)
[16] SONG Q J, JIANG H Y, LIU J.Feature selection based on FDA and F-score for multi-class classification[J].Expert Systems with Applications, 2017, 81(1):22-27.
[17] MUHAMMED NIYAS K P, THIYAGARAJAN P.Feature selection using efficient fusion of Fisher score and greedy searching for Alzheimer's classification[J].Journal of King Saud University-Computer and Information Sciences, 2021, 33(10):125-136.
[18] BEHESHTI I, DEMIREL H.Feature-ranking-based Alzheimer's disease classification from structural MRI[J].Magnetic Resonance Imaging, 2016, 34(3):252-263.
[19] MOHSEN P, MOHAMMAD B D, HOSSEIN N.MLACO:a multi-label feature selection algorithm based on ant colony optimization[J].Knowledge-Based Systems, 2020, 192:105-118.
[20] ZHANG M L, ZHOU Z H.ML-KNN:a lazy learning approach to multi-label learning[J].Pattern Recognition, 2007, 40(7):2038-2048.

选择文件类型/文献管理软件名称

选择包含的内容

基于文本分类的Fisher Score快速多标记特征选择算法

Fisher Score Fast Multi-Label Feature Selection Algorithm Based on Text Classification

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨璇, 马建敏, 赵曼君. 基于邻域互信息的高维时序数据特征选择[J]. 计算机工程, 2023, 49(7): 135-142.
[2]	刘利, 张德生, 肖燕婷. 基于隶属度的模糊加权k近质心近邻算法[J]. 计算机工程, 2022, 48(7): 122-129.
[3]	艾成豪, 高建华, 黄子杰. 混合特征选择和集成学习驱动的代码异味检测[J]. 计算机工程, 2022, 48(7): 168-176,198.
[4]	范林歌, 武欣嵘, 童玮, 曾维军. 基于概率矩阵分解的不完整数据集特征选择方法[J]. 计算机工程, 2022, 48(6): 57-64.
[5]	张要, 马盈仓, 朱恒东, 李恒, 陈程. 结合流形学习与逻辑回归的多标签特征选择[J]. 计算机工程, 2022, 48(3): 90-99,106.
[6]	黄奕轩, 杜世强, 余瑶, 肖庆江, 宋金梅. 基于特征选择与鲁棒图学习的多视图聚类[J]. 计算机工程, 2022, 48(12): 95-103.
[7]	李晶晶, 孟利超, 张可, 鲁珂, 申恒涛. 领域自适应研究综述[J]. 计算机工程, 2021, 47(6): 1-13.
[8]	王俊红, 赵彬佳. 基于不平衡数据的特征选择算法研究[J]. 计算机工程, 2021, 47(11): 100-107.
[9]	王旭, 陈永乐, 王庆生, 陈俊杰. 结合特征选择与集成学习的密码体制识别方案[J]. 计算机工程, 2021, 47(1): 139-145,153.
[10]	袁哲明, 杨晶晶, 陈渊. 基于最大信息系数与冗余分摊的特征选择方法[J]. 计算机工程, 2020, 46(8): 101-105.
[11]	吴昌明, 赵兴涛, 柳可鑫. 基于三元组排序局部性的SOCFS改进算法[J]. 计算机工程, 2020, 46(5): 47-53.
[12]	陈良臣, 高曙, 刘宝旭, 陶明峰. 网络流量异常检测中的维数约简研究[J]. 计算机工程, 2020, 46(2): 11-20.
[13]	刘洁, 王铮, 王辉. 基于IMI-WNB算法的垃圾邮件过滤技术研究[J]. 计算机工程, 2020, 46(12): 299-304,312.
[14]	朱文峰, 于舒娟, 何伟. 基于IG_CDmRMR的二阶段特征选择方法[J]. 计算机工程, 2019, 45(9): 183-187,193.
[15]	张波, 周从华, 张付全, 张婷, 蒋跃明. 一种面向SNP选择的模糊聚类算法[J]. 计算机工程, 2019, 45(8): 66-74.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于文本分类的Fisher Score快速多标记特征选择算法

Fisher Score Fast Multi-Label Feature Selection Algorithm Based on Text Classification

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献

相关文章 15

编辑推荐

Metrics

本文评价