作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (20): 92-94. doi: 10.3969/j.issn.1000-3428.2006.20.034

• 软件技术与数据库 • 上一篇    下一篇

特征序列分析方法在文本分类中的应用

卢炎生,崔得暄,邹 磊   

  1. (华中科技大学计算机科学与技术学院,武汉 430074)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-10-20 发布日期:2006-10-20

Application of Signature Sequence Analysis in Text Classification

LU Yansheng, CUI Dexuan, ZOU Lei   

  1. (College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-20 Published:2006-10-20

摘要: 把计算生物学中DNA序列分析的一种方法应用到文本分类中,通过分析文档集所产生的可描述类别内在特征的特征序列,给出了一种文本分类方法SSAM,并在Reuters21578数据集上和其它几种常见分类方法的分类效果进行了比较,实验结果显示SSAM的分类效果优于传统的贝叶斯方法,而且具有较快的分类速度。

关键词: 文本分类, 向量空间模型, 特征序列

Abstract: This paper applies a method of DNA sequence analysis in computational biology to text classification, and puts forward a text classifying method called SSAM by analyzing the signature sequences generated by a document collection. The experimental result on Reuters21578 and the comparison with several other text classifying methods prove SSAM has a better performance than Naive Bayes, and has a rapid classifying speed.

Key words: Text classification, Vector space model, Signature sequence