作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (23): 89-91. doi: 10.3969/j.issn.1000-3428.2008.23.033

• 软件技术与数据库 • 上一篇    下一篇

中文文本体裁分类中特征选择的研究

邓 琦,苏一丹,曹 波,闭剑婷   

  1. (广西大学计算机与电子信息学院,南宁 530004)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-12-05 发布日期:2008-12-05

Research on Feature Selection in Chinese Text Genre Classification

DENG Qi, SU Yi-dan, CAO Bo, BI Jian-ting   

  1. (College of Computer and Electronic Information, Guangxi University, Nanning 530004)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-12-05 Published:2008-12-05

摘要: 针对文本体裁自动分类在特征选择和权重计算方面的特殊性,提出文本的内容类别信息,改进传统特征选择方法CHI以及权重计算公式tf.idf,并运用支持向量机在含5类体裁的语料上进行中文文本体裁自动分类。实验结果表明,该方案是可行的。

关键词: 中文信息处理, 体裁分类, 特征项选择, 支持向量机

Abstract: Aiming at the particularity of text genre classification in feature selection and weight calculation, this paper presents the text content category information, which improves the conventional CHI feature selection method and the tf.idf formula of feature weight. By using Support Vector Machine(SVM), an automatic classification on a Chinese text corpus consisting of five genres is carried out. Experimental results show this scheme is feasible.

Key words: Chinese information processing, genre classification, feature selection, Support Vector Machine(SVM)

中图分类号: