作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (19): 39-41. doi: 10.3969/j.issn.1000-3428.2010.19.013

• 软件技术与数据库 • 上一篇    下一篇

基于粗糙集与泛系等价算子的特征选择

朱颢东1,2,钟 勇1,2   

  1. (1. 中国科学院成都计算机应用研究所,成都 610041;2. 中国科学院研究生院,北京 100039)
  • 出版日期:2010-10-05 发布日期:2010-09-27
  • 作者简介:朱颢东(1980-),男,博士,主研方向:软件过程,文本挖掘;钟 勇,研究员、博士生导师
  • 基金资助:
    四川省科技计划基金资助项目(2008GZ0003);四川省科技攻关计划基金资助项目(07GG006-019)

Feature Selection Based on Rough Set and Pansystems Equivalence Operator

ZHU Hao-dong 1,2, ZHONG Yong 1,2   

  1. (1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China; 2. Graduate School of Chinese Academy of Sciences, Beijing 100039, China)
  • Online:2010-10-05 Published:2010-09-27

摘要: 传统特征选择方法选出的特征子集存在冗余,并且不具备较好的代表性。针对该问题,提出基于粗糙集与泛系等价算子的特征选择方法。利用基于最小词频的文档频提取初始特征,通过泛系等价算子对粗糙集进行扩展,并给出属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明,该方法具有较高的准确率和召回率。

关键词: 文本分类, 文档频, 泛系等价算子, 粗糙集, 属性约简

Abstract: Feature subset which is selected by traditional feature selection method has many redundant features and it is not representative. Aiming at this problem, this paper presents a feature selection method based on Rough Set(RS) and pansystems equivalence operator. The method uses the document frequency based on minimum word frequency to extract original features, uses pansystems equivalence operator to expand RS and gives an attribute reduction algorithm to eliminate redundancy. It can acquire more representative feature subset. Experimental results show that the method has higher precision and recall.

Key words: text categorization, document frequency, pansystems equivalence operator, Rough Set(RS), attribute reduction

中图分类号: