作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (10): 167-169. doi: 10.3969/j.issn.1000-3428.2011.10.057

• 人工智能及识别技术 • 上一篇    下一篇

一种改进的相对熵特征选择方法

王 辉,张成锁,卓呈祥   

  1. (北京康凯信息咨询有限公司,北京 100007)
  • 出版日期:2011-05-20 发布日期:2011-05-20
  • 作者简介:王 辉(1982-),男,硕士,主研方向:数据挖掘;张成锁,硕士;卓呈祥,学士

Improved Feature Selection Method for Relative Entropy

WANG Hui, ZHANG Cheng-suo, ZHUO Cheng-xiang   

  1. (Beijing Kangkai Information Consultation Co., Ltd., Beijing 100007, China)
  • Online:2011-05-20 Published:2011-05-20

摘要: 提出一种改进的相对熵特征选择方法。该方法基于一个类别的文本属性通常由有限个特征词决定的特点,利用相对熵的基本原理,选取最能区分类内与类外文本的词作为文本分类的特征。在特定文本语料库中进行的实验结果表明,该方法可以降低文本特征维数,提高分类准确率。

关键词: 特征选择, 相对熵, 文本分类, 语料库

Abstract: This paper proposes a new feature selection method based on relative entropy for feature selection, which is one of the key technologies in text categorization. Based on that text category is decided by limited keywords, this paper uses relative entropy to select the words distinguishing effectively between one category and another. Experimental results show that the proposed method can effectively reduce feature dimension and improve precision rate.

Key words: feature selection, relative entropy, text categorization, corpus

中图分类号: