作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 144-153. doi: 10.19678/j.issn.1000-3428.0069985

• 计算智能与模式识别 • 上一篇    下一篇

基于邻域粒度条件熵的动态萤火虫特征选择算法

吴国霞1, 邱雅茹2, 江峰1,*()   

  1. 1. 青岛科技大学信息科学技术学院, 山东 青岛 266061
    2. 青岛科技大学数据科学学院, 山东 青岛 266061
  • 收稿日期:2024-06-11 修回日期:2024-08-21 出版日期:2026-01-15 发布日期:2024-11-06
  • 通讯作者: 江峰
  • 作者简介:

    吴国霞(CCF学生会员), 女, 硕士研究生, 主研方向为机器学习、粗糙集、数据挖掘

    邱雅茹, 硕士研究生

    江峰(通信作者), 教授

  • 基金资助:
    国家自然科学基金(61973180); 国家自然科学基金(62172249); 山东省自然科学基金(ZR2022MF326)

Dynamic Firefly Algorithm for Feature Selection Based on Neighborhood Granularity Conditional Entropy

WU Guoxia1, QIU Yaru2, JIANG Feng1,*()   

  1. 1. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, Shandong, China
    2. College of Data Science, Qingdao University of Science and Technology, Qingdao 266061, Shandong, China
  • Received:2024-06-11 Revised:2024-08-21 Online:2026-01-15 Published:2024-11-06
  • Contact: JIANG Feng

摘要:

针对传统的萤火虫算法(FA)在处理优化问题时存在的收敛速度慢、易陷入局部最优解等问题, 提出一种动态的萤火虫算法, 并将该算法与邻域粗糙集相关理论相结合开展特征选择的研究, 从而实现对连续型数值的有效处理, 并且有效提高特征选择的性能。首先, 为了改进萤火虫算法的搜索策略, 引入POX(Precedence Operation Crossover)变异策略并采用阈值设置控制萤火虫交叉变异的概率, 便于陷入局部最优的个体及时跳出, 提出一种动态的萤火虫算法; 其次, 为了能够同时考虑到知识完备性和知识粒度大小, 将邻域粗糙集中的邻域知识粒度与条件熵有机结合, 提出一种新的信息熵模——邻域粒度条件熵; 最后, 提出一种基于邻域粒度条件熵与动态萤火虫算法的特征选择算法FS_NGHFAPOX, 该算法采用邻域粒度条件熵来构建适应度函数, 进而更好地评价特征子集。在UCI和scikit-learn机器学习库中的内置数据库中部分数据集上进行实验验证, 验证结果表明FS_NGHFAPOX算法分类性能最优且所选特征子集数量更少, 平均准确率达到0.83, 相较于其他特征选择算法最多提高了15%。

关键词: 特征选择, 萤火虫算法, 变异策略, 适应度函数, 邻域知识粒度, 邻域粒度条件熵

Abstract:

To address the slow convergence and susceptibility of the traditional Firefly Algorithm (FA) to local optima in solving optimization problems, this paper proposes a dynamic firefly algorithm. The proposed algorithm is integrated with neighborhood rough set theory for feature selection, effectively processing continuous values and enhancing the performance of feature selection. The algorithm improves the FA search strategy by incorporating the Precedence Operation Crossover (POX) mutation strategy and threshold settings to control the probability of firefly crossover and mutation, thereby enabling individuals trapped in local optima to escape. Furthermore, it introduces a new information entropy model-the neighborhood granular conditional entropy-by combining neighborhood knowledge granularity with conditional entropy to balance knowledge completeness and granularity. The feature selection algorithm FS_NGHFAPOX, which is based on neighborhood granular conditional entropy and the dynamic firefly algorithm, constructs the fitness function to improve the evaluation of feature subsets. Experiments conducted on several datasets from the UCI repository and built-in databases of the scikit-learn machine learning library demonstrate that the FS_NGHFAPOX algorithm achieves optimal classification performance with a smaller number of selected feature subsets. Specifically, the FS_NGHFAPOX algorithm achieved an average accuracy of 0.83 on the experimental datasets, which is up to 15% higher than those of the other feature selection algorithms.

Key words: feature selection, Firefly Algorithm (FA), mutation strategy, fitness function, neighborhood knowledge granularity, neighborhood granularity conditional entropy