Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2011, Vol. 37 ›› Issue (10): 167-169. doi: 10.3969/j.issn.1000-3428.2011.10.057

• Networks and Communications • Previous Articles     Next Articles

Improved Feature Selection Method for Relative Entropy

WANG Hui, ZHANG Cheng-suo, ZHUO Cheng-xiang   

  1. (Beijing Kangkai Information Consultation Co., Ltd., Beijing 100007, China)
  • Online:2011-05-20 Published:2011-05-20

一种改进的相对熵特征选择方法

王 辉,张成锁,卓呈祥   

  1. (北京康凯信息咨询有限公司,北京 100007)
  • 作者简介:王 辉(1982-),男,硕士,主研方向:数据挖掘;张成锁,硕士;卓呈祥,学士

Abstract: This paper proposes a new feature selection method based on relative entropy for feature selection, which is one of the key technologies in text categorization. Based on that text category is decided by limited keywords, this paper uses relative entropy to select the words distinguishing effectively between one category and another. Experimental results show that the proposed method can effectively reduce feature dimension and improve precision rate.

Key words: feature selection, relative entropy, text categorization, corpus

摘要: 提出一种改进的相对熵特征选择方法。该方法基于一个类别的文本属性通常由有限个特征词决定的特点,利用相对熵的基本原理,选取最能区分类内与类外文本的词作为文本分类的特征。在特定文本语料库中进行的实验结果表明,该方法可以降低文本特征维数,提高分类准确率。

关键词: 特征选择, 相对熵, 文本分类, 语料库

CLC Number: