计算机工程 ›› 2009, Vol. 35 ›› Issue (19): 56-58,6.doi: 10.3969/j.issn.1000-3428.2009.19.018

• 软件技术与数据库 • 上一篇    下一篇

基于Log似然比的特征选择算法

林 森,唐发根   

  1. (北京航空航天大学计算机学院,北京 100083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-10-05 发布日期:2009-10-05

Feature Selection Algorithm Based on Log Likelihood Ratio

LIN Sen, TANG Fa-gen   

  1. (School of Computer, Beijing University of Aeronautics and Astronautics, Beijing 100083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-10-05 Published:2009-10-05

摘要: 针对基于向量空间模型文本分类系统中特征选择算法存在的问题,提出一种基于Log似然比的特征选择算法,引进Log似然比统计量,在考虑稀有事件对分类结果产生正面影响的同时,较好地控制其对分类产生的负面影响。采用KNN分类方法,将Log似然比特征选择算法与典型特征算法进行比较,实验结果表明,该算法能够获得良好的性能。

关键词: 文本分类, 向量空间模型, 特征选择

Abstract: Aiming at the problems in feature selection algorithm of text classification system based on vector space model, a feature selection algorithm based on Log likelihood ratio is proposed, which introduces the Log likelihood ratio statistic, and considers the positive impact on classification results by uncommon events, while controlling the negative ones. It is compared with typical feature algorithm by using K Nearest Neighbor(KNN) method. Experimental results show this algorithm can obtain better performance.

Key words: text categorization, vector space model, feature selection

中图分类号: