基于特征加权交互的多标签特征选择

doi:10.19678/j.issn.1000-3428.0260013

摘要/Abstract

摘要： 多标签模糊数据中存在着特征冗余、交互关系复杂及特征重要度差异大等问题，制约了多标签学习的分类性能。为此，提出ReliefF-β算法对特征赋权，给出基于特征加权交互的多标签特征选择方法。首先，针对多标签模糊数据，构造特征相似度和标签相似度，引入调节参数β融合两类相似度，构建全局样本相似度，提出ReliefF-β算法为特征赋权。其次，基于特征权重引入多标签加权模糊粗糙集，定义加权模糊熵及加权模糊互信息等不确定性度量，研究其性质和关系。接着，综合考虑特征的相关性、冗余性和交互性，定义特征加权评价函数，给出基于特征加权交互的多标签特征选择算法。最后，在两种分类器下对所提算法进行对比实验分析，结果表明，相比其他对比算法，在ML-KNN下，平均精度(AP)平均提升8.79%，汉明损失(HL)、排序损失(RL)、覆盖率(CV)和1-错误率(OE)分别平均降低5.06%、15.33%、10.97%和23.06%；在BRDT下，AP平均提升4.06%，HL、RL、CV和OE分别平均降低8.60%、10.28%、7.19%和5.89%，消融实验与统计检验进一步验证了所提方法的有效性。

Abstract: In multi-label fuzzy data, feature redundancy, complex interaction relationships between features, and unequal feature contributions are commonly present, which affect the classification performance of multi-label learning. To address these issues, ReliefF-β algorithm is proposed to assign feature weights, and a multi-label feature selection method based on feature weighted interaction is presented. Firstly, feature similarity and label similarity are constructed for multi-label fuzzy data. A regulating parameter β is introduced to fuse the two similarities and construct a global sample similarity, then ReliefF-β algorithm is proposed for feature weighting. On this basis, multi-label weighted fuzzy rough set is introduced based on feature weights, and uncertainty measures such as weighted fuzzy entropy and weighted fuzzy mutual information are defined. The related properties and relationships among these measures are studied. Furthermore, a feature weighted evaluation function is defined by considering feature relevance, redundancy, and interaction, then a multi-label feature selection algorithm based on feature weighted interaction is proposed. Finally, comparative experiments are conducted under two classifiers. The results show that, compared with other comparison algorithms, under ML-KNN, the proposed method improves Average Precision (AP) by 8.79% on average, while Hamming Loss (HL), Ranking Loss (RL), Coverage (CV), and One-Error (OE) are reduced by 5.06%, 15.33%, 10.97% and 23.06%, respectively. Under BRDT, AP is improved by 4.06%, and HL, RL, CV, and OE are reduced by 8.60%, 10.28%, 7.19% and 5.89%, respectively. Ablation studies and statistical tests further verify the effectiveness of the proposed method.

杨心怡, 马建敏, 马玉坡. 基于特征加权交互的多标签特征选择[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260013.

YANG Xinyi, MA Jianmin , MA Yupo. Multi-label Feature Selection based on Feature Weighted Interaction[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260013.

参考文献

[1] SUN L, WANG T X, DING W P, et al. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification[J]. Information Sciences, 2021, 578: 887-912.
[2] YIN T Y, CHEN H M, LI T R, et al. Robust feature selection using label enhancement and β-precision fuzzy rough sets for multilabel fuzzy decision system[J]. Fuzzy Sets and Systems, 2023, 461: 108462.
[3] WANG C Z, HU Q H, WANG X Z, et al. Feature selection based on neighborhood discrimination index[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(7): 2986-2999.
[4] ZHANG P, GAO W F, HU J C, et al. Multi-label feature selection based on the division of label topics[J]. Information Sciences, 2021, 553: 129-153.
[5] PAWLAK Z. Rough sets[J]. International Journal of Computer & Information Sciences, 1982, 11(5): 341-356.
[6] 杨璇, 马建敏, 赵曼君. 基于邻域互信息的高维时序数据特征选择[J]. 计算机工程, 2023, 49(7): 135-142, 149. YANG X, MA J M, ZHAO M J. Feature selection of high-dimensional time-series data based on neighborhood mutual information[J]. Computer Engineering, 2023, 49(7): 135-142, 149.
[7] 赵洁, 叶文浩, 梁周扬, 等. 基于不一致近邻的模糊粗糙集特征选择[J]. 计算机工程, 2024, 50(1): 110-119. ZHAO J, YE W J, LIANG Z Y, et al. Fuzzy rough set feature selection based on inconsistent nearest neighbors[J]. Computer Engineering, 2024, 50(1): 110-119.
[8] XU J C, SHEN K L, SUN L. Multi-label feature selection based on fuzzy neighborhood rough sets[J]. Complex & Intelligent Systems, 2022, 8(3): 2105-2129.
[9] 陈曦, 马建敏, 刘权芳. 基于模糊依赖决策熵的多标签特征选择[J]. 昆明理工大学学报(自然科学版), 2024, 49(2): 62-72. CHEN X, MA J M, LIU Q F. Multi-label feature selection based on fuzzy dependent decision entropy[J]. Journal of Kunming University of Science and Technology (Natural Science), 2024, 49(2): 62-72.
[10] SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
[11] SUN L, XU F, DING W P, et al. AFIFC: Adaptive fuzzy neighborhood mutual information-based feature selection via label correlation[J]. Pattern Recognition, 2025, 164: 111577.
[12] DING J F, QIAN W B, LI Y H, et al. Partial label feature selection via label disambiguation and neighborhood mutual information[J]. Information Sciences, 2024, 680: 121163.
[13] SUN L, DU W J, XU J C, et al. Noise-resistant fuzzy multineighbourhood rough set-based feature selection with label enhancement and its application for multilabel classification[J]. Applied Soft Computing, 2024, 167: 112284.
[14] ZHOU G Z, LI R X, SHANG Z H, et al. Multi-label feature selection based on minimizing feature redundancy of mutual information[J]. Neurocomputing, 2024, 607: 128392.
[15] KONONENKO I. Estimating attributes: Analysis and extensions of RELIEF[C]//European Conference on Machine Learning, Berlin: Springer, 1994: 171-182.
[16] 孙林, 陈雨生, 徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 山东大学学报(理学版), 2022, 57(4): 1-11. SUN L, CHEN Y S, XU J C. Multilabel feature selection algorithm based on improved ReliefF[J]. Journal of Shandong University (Natural Science), 2022, 57(4): 1-11.
[17] WAN J H, CHEN H M, YUAN Z, et al. A novel hybrid feature selection method considering feature interaction in neighborhood rough set[J]. Knowledge-Based Systems, 2021, 227: 107167.
[18] WAN J H, CHEN H M, LI T R, et al. Interactive and complementary feature selection via fuzzy multigranularity uncertainty measures[J]. IEEE Transactions on Cybernetics, 2021, 53(2): 1208-1221.
[19] TSOUMAKAS G, SPYROMITROS XIOUFIS E, VILCEK J, et al. Mulan: A java library for multi-label learning[J]. The Journal of Machine Learning Research, 2011, 12: 2411-2414.
[20] ZHANG M L, ZHOU Z H. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
[21] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837.
[22] GONCALVES E C, ALEXANDRE P, FREITAS A A. A genetic algorithm for optimizing the label ordering in multi-label classifier chains[C]//IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon: IEEE, 2013: 469-476.
[23] BLOCKEEL H, DZEROSKI S, GRBOVIC J. Simultaneous prediction of multiple chemical parameters of river water quality with TILDE[C]//European Conference on Principles of Data Mining and Knowledge Discovery, Berlin, Heidelberg: Springer, 1999: 32-40.
[24] TROHIDIS K, TSOUMAKAS G, KALLIRIS G, et al. Multilabel classification of music into emotions[C]//2008 International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia: ISMIR, 2008: 325-330.
[25] ELISSEEFF A, WESTON J. A kernel method for multi-labelled classification[C]//Proceedings of the 15th Annual Conference on Neural Information Processing Systems (NIPS), Cambridge: MIT Press, 2001: 681-687.
[26] XU J H, LIU J L, YIN J, et al. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously[J]. Knowledge-Based Systems, 2016, 98(8): 172-184.
[27] BOUTELL M R, LUO J, SHEN X, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757-1771.
[28] WANG C Z, HUANG Y, SHAO M W, et al. Feature selection based on neighborhood self-information[J]. IEEE Transactions on Cybernetics, 2019, 50(9): 4031-4042.
[29] ROBNIK SIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and RReliefF[J]. Machine Learning, 2003, 53(1): 23-69.
[30] DAI J H, CHEN W X, QIAN Y H, et al. Instance-dependent incomplete multi-label feature selection by fuzzy tolerance relation and fuzzy mutual implication granularity[J]. IEEE Transactions on Knowledge and Data Engineering, 2025, 37(10): 5994-6008.
[31] HASHEMI A, DOWLATSHAHI M B, NEZAMABADI POUR H. MFS-MCDM: Multi-label feature selection [1] SUN L, WANG T X, DING W P, et al. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification[J]. Information Sciences, 2021, 578: 887-912.
[2] YIN T Y, CHEN H M, LI T R, et al. Robust feature selection using label enhancement and β-precision fuzzy rough sets for multilabel fuzzy decision system[J]. Fuzzy Sets and Systems, 2023, 461: 108462.
[3] WANG C Z, HU Q H, WANG X Z, et al. Feature selection based on neighborhood discrimination index[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(7): 2986-2999.
[4] ZHANG P, GAO W F, HU J C, et al. Multi-label feature selection based on the division of label topics[J]. Information Sciences, 2021, 553: 129-153.
[5] PAWLAK Z. Rough sets[J]. International Journal of Computer & Information Sciences, 1982, 11(5): 341-356.
[6] 杨璇, 马建敏, 赵曼君. 基于邻域互信息的高维时序数据特征选择[J]. 计算机工程, 2023, 49(7): 135-142, 149. YANG X, MA J M, ZHAO M J. Feature selection of high-dimensional time-series data based on neighborhood mutual information[J]. Computer Engineering, 2023, 49(7): 135-142, 149.
[7] 赵洁, 叶文浩, 梁周扬, 等. 基于不一致近邻的模糊粗糙集特征选择[J]. 计算机工程, 2024, 50(1): 110-119. ZHAO J, YE W J, LIANG Z Y, et al. Fuzzy rough set feature selection based on inconsistent nearest neighbors[J]. Computer Engineering, 2024, 50(1): 110-119.
[8] XU J C, SHEN K L, SUN L. Multi-label feature selection based on fuzzy neighborhood rough sets[J]. Complex & Intelligent Systems, 2022, 8(3): 2105-2129.
[9] 陈曦, 马建敏, 刘权芳. 基于模糊依赖决策熵的多标签特征选择[J]. 昆明理工大学学报(自然科学版), 2024, 49(2): 62-72. CHEN X, MA J M, LIU Q F. Multi-label feature selection based on fuzzy dependent decision entropy[J]. Journal of Kunming University of Science and Technology (Natural Science), 2024, 49(2): 62-72.
[10] SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
[11] SUN L, XU F, DING W P, et al. AFIFC: Adaptive fuzzy neighborhood mutual information-based feature selection via label correlation[J]. Pattern Recognition, 2025, 164: 111577.
[12] DING J F, QIAN W B, LI Y H, et al. Partial label feature selection via label disambiguation and neighborhood mutual information[J]. Information Sciences, 2024, 680: 121163.
[13] SUN L, DU W J, XU J C, et al. Noise-resistant fuzzy multineighbourhood rough set-based feature selection with label enhancement and its application for multilabel classification[J]. Applied Soft Computing, 2024, 167: 112284.
[14] ZHOU G Z, LI R X, SHANG Z H, et al. Multi-label feature selection based on minimizing feature redundancy of mutual information[J]. Neurocomputing, 2024, 607: 128392.
[15] KONONENKO I. Estimating attributes: Analysis and extensions of RELIEF[C]//European Conference on Machine Learning, Berlin: Springer, 1994: 171-182.
[16] 孙林, 陈雨生, 徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 山东大学学报(理学版), 2022, 57(4): 1-11. SUN L, CHEN Y S, XU J C. Multilabel feature selection algorithm based on improved ReliefF[J]. Journal of Shandong University (Natural Science), 2022, 57(4): 1-11.
[17] WAN J H, CHEN H M, YUAN Z, et al. A novel hybrid feature selection method considering feature interaction in neighborhood rough set[J]. Knowledge-Based Systems, 2021, 227: 107167.
[18] WAN J H, CHEN H M, LI T R, et al. Interactive and complementary feature selection via fuzzy multigranularity uncertainty measures[J]. IEEE Transactions on Cybernetics, 2021, 53(2): 1208-1221.
[19] TSOUMAKAS G, SPYROMITROS XIOUFIS E, VILCEK J, et al. Mulan: A java library for multi-label learning[J]. The Journal of Machine Learning Research, 2011, 12: 2411-2414.
[20] ZHANG M L, ZHOU Z H. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
[21] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837.
[22] GONCALVES E C, ALEXANDRE P, FREITAS A A. A genetic algorithm for optimizing the label ordering in multi-label classifier chains[C]//IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon: IEEE, 2013: 469-476.
[23] BLOCKEEL H, DZEROSKI S, GRBOVIC J. Simultaneous prediction of multiple chemical parameters of river water quality with TILDE[C]//European Conference on Principles of Data Mining and Knowledge Discovery, Berlin, Heidelberg: Springer, 1999: 32-40.
[24] TROHIDIS K, TSOUMAKAS G, KALLIRIS G, et al. Multilabel classification of music into emotions[C]//2008 International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia: ISMIR, 2008: 325-330.
[25] ELISSEEFF A, WESTON J. A kernel method for multi-labelled classification[C]//Proceedings of the 15th Annual Conference on Neural Information Processing Systems (NIPS), Cambridge: MIT Press, 2001: 681-687.
[26] XU J H, LIU J L, YIN J, et al. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously[J]. Knowledge-Based Systems, 2016, 98(8): 172-184.
[27] BOUTELL M R, LUO J, SHEN X, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757-1771.
[28] WANG C Z, HUANG Y, SHAO M W, et al. Feature selection based on neighborhood self-information[J]. IEEE Transactions on Cybernetics, 2019, 50(9): 4031-4042.
[29] ROBNIK SIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and RReliefF[J]. Machine Learning, 2003, 53(1): 23-69.
[30] DAI J H, CHEN W X, QIAN Y H, et al. Instance-dependent incomplete multi-label feature selection by fuzzy tolerance relation and fuzzy mutual implication granularity[J]. IEEE Transactions on Knowledge and Data Engineering, 2025, 37(10): 5994-6008.
[31] HASHEMI A, DOWLATSHAHI M B, NEZAMABADI POUR H. MFS-MCDM: Multi-label feature selection using multi-criteria decision making[J]. Knowledge-Based Systems, 2020, 206: 106365.
[32] ZHANG J, WU H R, JIANG M, et al. Group-preserving label-specific feature selection for multi-label learning[J]. Expert Systems with Applications, 2023, 213: 118861.
[33] ZHANG Y, HUO W, TANG J. Multi-label feature selection via latent representation learning and dynamic graph constraints[J]. Pattern Recognition, 2024, 151: 110411.
[34] XU X Y, WEI F L, YU T Z, et al. Embedded multi-label feature selection via orthogonal regression[J]. Pattern Recognition, 2025, 163: 111477.
[35] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3): 349-357.
[36] FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. Annals of Mathematical Statistics, 1940, 11(1): 86-92.
[37] DEMSAR J. Statistical comparisons of classifiers over multiple data sets[J]. Journal of Machine Learning Research, 2006, 7(1): 1-30.
[38] SAARY M J. Radar plots: A useful way for presenting multivariate health care data[J]. Journal of Clinical Epidemiology, 2008, 61(4): 311-317.

选择文件类型/文献管理软件名称

选择包含的内容