作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

MFG-FS:基于多源模糊粒化的部分标签特征选择

  • 发布日期:2025-10-31

MFG-FS:Partial Label Feature Selection based on Multi-source Fuzzy Granulation

  • Published:2025-10-31

摘要: 特征选择可从复杂数据中筛选有效特征,提升信息处理效率,然而在部分标签数据场景下,标签具有模糊性、样本间复杂关系以及特征重要性评估困难,都致使传统特征选择方法面临诸多挑战。为此,本文提出一种针对部分标签数据集的有效特征选择框架MFG-FS。首先,针对部分标签的模糊性,设计一种基于MLP-Mixer模型和对比学习的端到端消歧方法,以优化样本特征表示空间,增强区分能力,从而获得更可靠的标签置信度;其次,为准确刻画部分标签数据下复杂的样本关系,构建了一种融合多源信息的模糊相似关系信息粒,能有效融合基于特征空间的局部结构、基于消歧后标签的全局关联以及标签约束;然后,在构建的模糊信息粒基础上,定义并利用模糊互信息度量进行特征评价,该度量能够量化特征子集与标签的相关性及内部冗余性,为高质量特征子集选择提供可靠依据;最后,本文提出的方法在包含了5个合成数据集和4个真实数据集上进行了相关实验,结果表明该MFG-FS能够选择出更具区分度和鲁棒性的特征子集,在部分标签消歧和分类准确率方面具有较好的性能。

Abstract: Feature selection can effectively identify informative features from complex data to improve information processing efficiency. However, in partially labeled data scenarios, traditional feature selection methods face significant challenges due to inherent label ambiguity, complex inter-sample relationships, and difficulties in feature importance evaluation. To address these challenges, this paper proposes MFG-FS, an effective feature selection framework for partially labeled datasets. First, to tackle label ambiguity, we design an end-to-end disambiguation method based on the MLP-Mixer model and contrastive learning, which optimizes the feature representation space to enhance discriminative power and obtain more reliable label confidence distributions. Second, to accurately characterize complex sample relationships in partially labeled data, we construct fuzzy similarity relations and information granules that integrate multi-source information, effectively combining local feature-space structures, global correlations from disambiguated labels, and label constraints. Subsequently, based on the constructed fuzzy information granules, we define and employ a fuzzy mutual information measure for feature evaluation, which quantifies the relevance between feature subsets and labels while assessing internal redundancy, thereby providing a robust basis for high-quality feature subset selection. Finally, extensive experiments on five synthetic and four real-world datasets demonstrate that MFG-FS can select more discriminative and robust feature subsets, achieving superior performance in partial label disambiguation and classification accuracy.