作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (2): 194-196. doi: 10.3969/j.issn.1000-3428.2009.02.068

• 人工智能及识别技术 • 上一篇    下一篇

文本分类中一种混合型特征降维方法

刘海峰1,2,王元元1,姚泽清2,张述祖2   

  1. (1. 解放军理工大学指挥自动化学院,南京 210007;2. 解放军理工大学理学院,南京 210007)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-01-20 发布日期:2009-01-20

Mixed Method of Reducing Feature in Text Classification

LIU Hai-feng1,2, WANG Yuan-yuan1, YAO Ze-qing2, ZHANG Shu-zu2   

  1. (1. Institute of Command Automation, PLA University of Science and Technology, Nanjing 210007; 2. Institute of Sciences, PLA University of Science and Technology, Nanjing 210007)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-01-20 Published:2009-01-20

摘要: 提出一种基于特征选择和特征抽取的混合型文本特征降维方法,分析基于选择和抽取的特征降维方法各自的特点,借助特征项的类别分布差异信息对特征集进行初步选择。使用一种新的基于PCA的特征抽取方法对剩余特征集进行二次抽取,在最大限度减少信息损失的前提下实现了文本特征的有效降维。对文本的分类实验结果表明,该特征降维方法具有良好的分类效果。

关键词: 文本分类, 特征选择, 特征抽取, 主成分分析

Abstract: A mixed method of reducing the text features based on feature selection and feature extraction is brought forward. The characteristics about feature selection and feature extraction are analyzed. Some features are chosen by using the sort distribution information. And a new way based on Principle Component Analysis(PCA) is used to extract the surplus features and realize the compression of features twice. In the precondition of the information loss least, the text feature decrease smart is completed. Test results show that this method has a better precision in the text categorization.

Key words: text classification, feature selection, feature extraction, Principle Component Analysis(PCA)

中图分类号: