作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (12): 22-24. doi: 10.3969/j.issn.1000-3428.2010.12.008

• 博士论文 • 上一篇    下一篇

基于独立性理论的文本分类特征选择方法

冯 霞,刘志辉,田继存   

  1. (中国民航大学计算机科学与技术学院,天津 300300)
  • 出版日期:2010-06-20 发布日期:2010-06-20
  • 作者简介:冯 霞(1970-),女,教授、博士,主研方向:数据挖掘,基于内容的图像检索;刘志辉、田继存,硕士
  • 基金资助:

    国家自然科学基金资助项目(60776806, 60672174);中国民航大学博士启动基金资助项目(06qd08s)

Feature Selection Method for Text Category Based on Independence Theory

FENG Xia, LIU Zhi-hui, TIAN Ji-cun   

  1. (School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300)
  • Online:2010-06-20 Published:2010-06-20

摘要:

特征与各个文档类在文本集中的独立程度体现了特征的代表性,文本分类的特征选择过程是选择能够提高分类性能的高代表性特征的过程。基于该原理提出DHChi2和EIBA 2种新的文本分类特征选择方法,对这2种方法进行合理的组合。实验结果表明,独立性理论应用于文本分类特征选择有利于提高分类性能。

关键词: 特征选择, 文本分类, 假设检验, 独立性理论

Abstract:

The degree of independence between a feature and each document category reflects the representation of the feature in the text set, while the procedure of selecting features is just a procedure in which the high representative subset of features are selected in text category. This paper proposes two approaches of feature selection based on the principle——DHChi2 and EIBA, and rationally combines the two approaches. Experimental results show that applying the independence theory to feature selection for text categorization can improve categorization performance.

Key words: feature selection, text category, hypothesis test, independence theory

中图分类号: