摘要: 传统特征选择方法选出的特征子集存在冗余,并且不具备较好的代表性。针对该问题,提出基于粗糙集与泛系等价算子的特征选择方法。利用基于最小词频的文档频提取初始特征,通过泛系等价算子对粗糙集进行扩展,并给出属性约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明,该方法具有较高的准确率和召回率。
关键词:
文本分类,
文档频,
泛系等价算子,
粗糙集,
属性约简
Abstract: Feature subset which is selected by traditional feature selection method has many redundant features and it is not representative. Aiming at this problem, this paper presents a feature selection method based on Rough Set(RS) and pansystems equivalence operator. The method uses the document frequency based on minimum word frequency to extract original features, uses pansystems equivalence operator to expand RS and gives an attribute reduction algorithm to eliminate redundancy. It can acquire more representative feature subset. Experimental results show that the method has higher precision and recall.
Key words:
text categorization,
document frequency,
pansystems equivalence operator,
Rough Set(RS),
attribute reduction
中图分类号:
朱颢东, 钟勇. 基于粗糙集与泛系等价算子的特征选择[J]. 计算机工程, 2010, 36(19): 39-41.
SHU Hao-Dong, ZHONG Yong. Feature Selection Based on Rough Set and Pansystems Equivalence Operator[J]. Computer Engineering, 2010, 36(19): 39-41.