MFG-FS:基于多源模糊粒化的部分标签特征选择

doi:10.19678/j.issn.1000-3428.0252706

摘要/Abstract

摘要： 特征选择可从复杂数据中筛选有效特征，提升信息处理效率，然而在部分标签数据场景下，标签具有模糊性、样本间复杂关系以及特征重要性评估困难，都致使传统特征选择方法面临诸多挑战。为此，本文提出一种针对部分标签数据集的有效特征选择框架MFG-FS。首先，针对部分标签的模糊性，设计一种基于MLP-Mixer模型和对比学习的端到端消歧方法，以优化样本特征表示空间，增强区分能力，从而获得更可靠的标签置信度；其次，为准确刻画部分标签数据下复杂的样本关系，构建了一种融合多源信息的模糊相似关系信息粒，能有效融合基于特征空间的局部结构、基于消歧后标签的全局关联以及标签约束；然后，在构建的模糊信息粒基础上，定义并利用模糊互信息度量进行特征评价，该度量能够量化特征子集与标签的相关性及内部冗余性，为高质量特征子集选择提供可靠依据；最后，本文提出的方法在包含了5个合成数据集和4个真实数据集上进行了相关实验，结果表明该MFG-FS能够选择出更具区分度和鲁棒性的特征子集，在部分标签消歧和分类准确率方面具有较好的性能。

Abstract: Feature selection can effectively identify informative features from complex data to improve information processing efficiency. However, in partially labeled data scenarios, traditional feature selection methods face significant challenges due to inherent label ambiguity, complex inter-sample relationships, and difficulties in feature importance evaluation. To address these challenges, this paper proposes MFG-FS, an effective feature selection framework for partially labeled datasets. First, to tackle label ambiguity, we design an end-to-end disambiguation method based on the MLP-Mixer model and contrastive learning, which optimizes the feature representation space to enhance discriminative power and obtain more reliable label confidence distributions. Second, to accurately characterize complex sample relationships in partially labeled data, we construct fuzzy similarity relations and information granules that integrate multi-source information, effectively combining local feature-space structures, global correlations from disambiguated labels, and label constraints. Subsequently, based on the constructed fuzzy information granules, we define and employ a fuzzy mutual information measure for feature evaluation, which quantifies the relevance between feature subsets and labels while assessing internal redundancy, thereby providing a robust basis for high-quality feature subset selection. Finally, extensive experiments on five synthetic and four real-world datasets demonstrate that MFG-FS can select more discriminative and robust feature subsets, achieving superior performance in partial label disambiguation and classification accuracy.

武倩楠, 丁卫平, 樊晓雪, 鞠恒荣, 周琳琳, 王静. MFG-FS:基于多源模糊粒化的部分标签特征选择[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252706.

Wu Qiannan, Ding Weiping, Fan Xiaoxue, Ju Hongrong, Zhou Linlin, Wang Jing. MFG-FS:Partial Label Feature Selection based on Multi-source Fuzzy Granulation[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252706.

参考文献

[1] Cour T, Sapp B, Taskar B. Learning from Partial Labels[J]. Journal of Machine Learning Research, 2011, 12: 1501-1536.
[2] Lee J, Kim Y, Kim S B. Noise-robust graph-based semi-supervised learning with dynamic shaving label propagation[J]. Applied Soft Computing, 2023, 142: 110371.
[3] Li Y F, Guo L Z, Zhou Z H. Towards Safe Weakly Supervised Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 334-346.
[4] Tian Y J, Yu X T, Fu S J. Partial label learning: Taxonomy, analysis and outlook[J]. Neural Networks Neural Networks, 2023, 161: 708-734.
[5] 胡声丹, 苗夺谦, 姚一豫. 基于三支标签传播的半监督属性约简[J]. 计算机学报, 2021, 44(11): 2332-2343. HU S D, MIAO D Q, YAO Y Y. Semi-supervised attribute reduction based on three-way label propagation[J]. Chinese Journal of Computers, 2021, 44(11): 2332-2343.
[6] 魏晓宁, 朱巧明. 基于Nave Bayes模型的垃圾邮件过滤方法[J]. 南通大学学报(自然科学版), 2008, No.24(01): 54-57. WEI X N, ZHU Q M. Spam filtering method based on Nave Bayes model[J]. Journal of Nantong University (Natural Science Edition), 2008, 24(1): 54-57.
[7] Zhang Z, Yao J L, Liu L, et al. Partial Label Feature Selection: An Adaptive Approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(8): 4178-4191.
[8] Wu J-H, Zhang M-L. Disambiguation Enabled Linear Discriminant Analysis for Partial Label Dimensionality Reduction[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2019: 416–424.
[9] Bao W-X, Hang J-Y, Zhang M-L. Submodular Feature Selection for Partial Label Learning[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2022: 26–34.
[10] Bao W-X, Hang J-Y, Zhang M-L. Partial Label Dimensionality Reduction via Confidence-Based Dependence Maximization[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM, 2021: 46–54.
[11] Qian W B, Li Y H, Ye Q Z, et al. Disambiguation-based partial label feature selection via feature dependency and label consistency[J]. Information Fusion, 2023, 94: 152-168.
[12] 樊晓雪, 杨光, 鞠恒荣，等. 面向多视角证据信息融合的高效特征选择方法[J]. 昆明理工大学学报（自然科学版）, 2025, 50(1): 72-84,157. FAN X X, YANG G, JU H R, et al. Efficient feature selection method for multi-view evidence information fusion[J]. Journal of Kunming University of Science and Technology (Natural Science Edition), 2025, 50(1): 72-84,157. (in Chinese)
[13] 张铃, 张钹. 模糊商空间理论(模糊粒度计算方法)[J]. 软件学报, 2003, 14(4): 770-776. ZHANG L, ZHANG B. Fuzzy quotient space theory (fuzzy granular computing method)[J]. Journal of Software, 2003, 14(4): 770-776. (in Chinese)
[14] Pawlak Z. Rough sets[J]. International Journal of Computer & Information Sciences, 1982, 11(5): 341-356.
[15] Gong C, Liu T L, Tang Y Y, et al. A Regularization Approach for Instance-Based Superset Label Learning[J]. IEEE Transactions on Cybernetics, 2018, 48(3): 967-978.
[16] Czelakowski J. AC is Equivalent to the Coherence Principle. Corrigendum to my Paper "Induction Principles for Sets"[J]. Fundamenta Informaticae, 2009, 93(4): 353-356.
[17] 周天奕, 丁卫平, 黄嘉爽，等. 模糊逻辑引导的多粒度深度神经网络[J]. 模式识别与人工智能, 2023, 36(9): 778-792. ZHOU T Y, DING W P, HUANG J S, et al. Fuzzy logic guided multi-granularity deep neural network[J]. Pattern Recognition and Artificial Intelligence, 2023, 36(9): 778-792. (in Chinese)
[18] 王静, 丁卫平, 尹涛, 等. 基于多模态模糊特征融合的脑龄协同预测算法[J]. 模式识别与人工智能, 2024, 37(7): 613-625. WANG J, DING W P, YIN T, et al. Collaborative brain age prediction algorithm based on multi-modal fuzzy feature fusion[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(7): 613-625. (in Chinese)
[19] 王国胤, 傅顺, 杨洁, 等. 基于多粒度认知的智能计算研究[J]. 计算机学报, 2022, 45(6): 1161-1175. WANG G Y, FU S, YANG J, et al. Research on intelligent computing based on multi-granular cognitive[J]. Chinese Journal of Computers, 2022, 45(6): 1161-1175. (in Chinese)
[20] Huang Z H, Li J J. Multi-level granularity entropies for fuzzy coverings and feature subset selection[J]. Artificial Intelligence Review, 2023, 56(10): 12171-12200.
[21] Campagner A, Ciucci D, Hüllermeier E. Rough set-based feature selection for weakly labeled data[J]. International Journal of Approximate Reasoning, 2021, 136: 150-167.
[22] Xu J C, Zhou C S, Xu S H, et al. Feature selection based on multi-perspective entropy of mixing uncertainty measure in variable-granularity rough set[J]. Applied Intelligence, 2024, 54(1): 147-168.
[23] 欧阳宵, 陶红, 范瑞东, 等. 利用标签相关性先验的弱监督多标签学习方法[J]. 软件学报, 2023, 34(4): 1732-1748. OUYANG X, TAO H, FAN R D, et al. Weakly supervised multi-label learning method using label correlation prior[J]. Journal of Software, 2023, 34(4): 1732-1748. (in Chinese)
[24] Hüllermeier E, Beringer J. Learning from ambiguously labeled examples[J]. Intelligent Data Analysis, 2006, 10(5): 419-439.
[25] Sun K, Min Z, Wang J. PP-PLL: Probability Propagation for Partial Label Learning [C]//Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2020: 123–137.
[26] Zhou Y, He J J, Gu H. Partial Label Learning via Gaussian Processes[J]. IEEE Transactions on Cybernetics, 2017, 47(12): 4443-4450.
[27] Jin R, Ghahramani Z. Learning with multiple labels [C]//Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2002: 921–928.
[28] Liu B, Zheng Z, Xiao Y, et al. Self-paced method for [1] Cour T, Sapp B, Taskar B. Learning from Partial Labels[J]. Journal of Machine Learning Research, 2011, 12: 1501-1536.
[29] Wang D B, Zhang M L, Li L. Adaptive Graph Guided Disambiguation for Partial Label Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 8796-8811.
[30] Hongchang C, Tian X I E, Chao G a O, et al. Candidate Label-Aware Partial Label Learning Algorithm[J]. Journal of Electronics & Information Technology, 2018, 41(10): 2516-2524.
[31] Xia S, Lv J, Xu N, et al. Ambiguity-Induced Contrastive Learning for Instance-Dependent Partial Label Learning [C]// Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vienna: IJCAI Organization, 2022: 3615–3621
[32] Tang W, Zhang W J, Zhang M-L. Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning[C]//Proceedings of the 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA: Curran Associates, Inc., 2023.
[33] Liu J, Wang B, Qi Z, et al. Learning from Label Proportions with Generative Adversarial Networks [C]//Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver: Curran Associates, 2019: 7169–7179.
[34] Tolstikhin I, Houlsby N, Kolesnikov A, et al. MLP-Mixer: An all-MLP Architecture for Vision[C]//Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). 2021: 24261-24272.
[35] Ahmad U, Batool T. Domination in rough fuzzy digraphs with application[J]. Soft Computing, 2023, 27(5): 2425-2442.
[36] 钟海博. 基于模糊互信息的多标签特征选择的研究[D]. 长春：吉林大学, 2022. ZHONG H B. Research on multi-label feature selection based on fuzzy mutual information[D]. Changchun: Jilin University, 2022. (in Chinese)
[37] Dai J, Liu Q, Zou X, et al. Feature selection based on fuzzy combination entropy considering global and local feature correlation[J]. Information Sciences, 2024, 652.
[38] Zhang M-L, Yu F. Solving the partial label learning problem: an instance-based approach [C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence. Buenos Aires: AAAI Press, 2015: 4048–4054.
[39] Wang W, Zhang M-L. Partial Label Learning with Discrimination Augmentation[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Washington, DC: ACM, 2022: 1920–1928.
[40] Zhang M-L, Zhou B-B, Liu X-Y. Partial Label Learning via Feature-Aware Disambiguation[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016: 1335–1344.
[41] Qian D, Liu K, Zhang S, et al. Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy[J]. Applied Intelligence, 2024, 54(17-18): 7750-7764.
[42] Yin T Y, Chen H M, Yuan Z, et al. A Robust Multilabel Feature Selection Approach Based on Graph Structure Considering Fuzzy Dependency and Feature Interaction[J]. IEEE Transactions on Fuzzy Systems, 2023, 31(12): 4516-4528.
[43] Zhang M-L, Wu J-H, Bao W-X. Disambiguation Enabled Linear Discriminant Analysis for Partial Label Dimensionality Reduction[J]. Acm Transactions on Knowledge Discovery from Data, 2022, 16(4).
[44] Qian W, Dong P, Dai S, et al. Incomplete label distribution feature selection based on neighborhood-tolerance discrimination index[J]. Applied Soft Computing, 2022, 130: 109693.
[45] Chen T, Kornblith S, Norouzi M, et al. A Simple Framework for Contrastive Learning of Visual Representations[C]//Proceedings of the 37th International Conference on Machine Learning (ICML). PMLR, 2020: 1597-1607.

选择文件类型/文献管理软件名称

选择包含的内容