作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 110-119. doi: 10.19678/j.issn.1000-3428.0066458

• 人工智能与模式识别 • 上一篇    下一篇

基于不一致近邻的模糊粗糙集特征选择

赵洁*(), 叶文浩, 梁周扬, 陈建新, 董振宁   

  1. 广东工业大学管理学院, 广东 广州 510520
  • 收稿日期:2022-12-07 出版日期:2024-01-15 发布日期:2023-03-20
  • 通讯作者: 赵洁
  • 基金资助:
    国家自然科学基金(71871069); 国家自然科学基金(72271063)

Fuzzy Rough Set Feature Selection Based on Inconsistent Nearest Neighbors

Jie ZHAO*(), Wenhao YE, Zhouyang LIANG, Jianxin CHEN, Zhenning DONG   

  1. School of Management, Guangdong University of Technology, Guangzhou 510520, Guangdong, China
  • Received:2022-12-07 Online:2024-01-15 Published:2023-03-20
  • Contact: Jie ZHAO

摘要:

模糊粗糙集可突破经典粗糙集仅能处理离散数据的局限,有效对连续型数值进行特征选择。然而,模糊粗糙集以对象为中心计算,时间复杂度高,难以处理高维和大规模数据。为此,基于水平截集提出一种不一致近邻加速策略。该策略跟踪论域中每个对象的模糊近邻集,持续删减其中不影响计算的近邻,若对象的不一致近邻删减至空,则删减该对象,从而提高算法效率。同时,设计一种基于不一致近邻递减的属性重要度,可有效抑制冗余特征入选,提升效率及分类精度。通过理论证明,所提的加速策略及属性重要度不影响属性入选的次序。在此基础上,提出新的模糊粗糙集特征选择算法。在9个UCI和scikit数据集上进行验证,实验结果表明,该算法不仅有效缩短运行时间,并可取得较高的分类精度,相比FA-FSCE、AVDP和IV-FS-FRS-2算法,运行时间至少可缩短9.44%,尤其在高维和大规模数据上可缩短61.01%~99.54%,在支持向量机和K-近邻算法的分类精度上最高可分别提高11.20%和19.95%。

关键词: 模糊粗糙集, 特征选择, 水平截集, 不一致近邻, 属性重要度

Abstract:

Fuzzy rough sets can break the limitation of classical rough sets that can only handle discrete data, effectively selecting features for continuous numerical values. However, they are object-centered and have high time complexity, rendering the handling of high-dimensional and large-scale data difficult. An inconsistent nearest neighbor acceleration strategy is proposed based on the horizontal cut set. This strategy tracks the fuzzy nearest neighbor set of each object in the domain, continuously pruning the nearest neighbors that do not affect the calculation. The object is pruned if the inconsistent nearest neighbors of the object are completely pruned, improving algorithm efficiency. At the same time, designing an attribute importance reduction based on inconsistent nearest neighbors can effectively suppress redundant feature selection, improving efficiency and classification accuracy. The proposed acceleration strategy and attribute importance do not affect the attribute selection's order. On this basis, a new fuzzy rough set feature selection algorithm is proposed. The experimental results on 9 UCI and scikit datasets show that the algorithm not only effectively reducing runtime but also achieving high classification accuracy. Compared with the FA-FSCE, AVDP, and IV-FS-FRS-2 algorithms, the running time of this algorithm can be reduced by at least 9.44%, especially on high-dimensional and large-scale datasets by 61.01% to 99.54%. The classification accuracy of Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) can be improved by up to 11.20% and 19.95%, respectively.

Key words: fuzzy rough set, feature selection, level-set, inconsistent nearest neighbors, significance of attributes