Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2008, Vol. 34 ›› Issue (23): 89-91.

• Software Technology and Database • Previous Articles     Next Articles

Research on Feature Selection in Chinese Text Genre Classification

DENG Qi, SU Yi-dan, CAO Bo, BI Jian-ting   

  1. (College of Computer and Electronic Information, Guangxi University, Nanning 530004)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-12-05 Published:2008-12-05

中文文本体裁分类中特征选择的研究

邓 琦,苏一丹,曹 波,闭剑婷   

  1. (广西大学计算机与电子信息学院,南宁 530004)

Abstract: Aiming at the particularity of text genre classification in feature selection and weight calculation, this paper presents the text content category information, which improves the conventional CHI feature selection method and the tf.idf formula of feature weight. By using Support Vector Machine(SVM), an automatic classification on a Chinese text corpus consisting of five genres is carried out. Experimental results show this scheme is feasible.

Key words: Chinese information processing, genre classification, feature selection, Support Vector Machine(SVM)

摘要: 针对文本体裁自动分类在特征选择和权重计算方面的特殊性,提出文本的内容类别信息,改进传统特征选择方法CHI以及权重计算公式tf.idf,并运用支持向量机在含5类体裁的语料上进行中文文本体裁自动分类。实验结果表明,该方案是可行的。

关键词: 中文信息处理, 体裁分类, 特征项选择, 支持向量机

CLC Number: