Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2006, Vol. 32 ›› Issue (20): 92-94. doi: 10.3969/j.issn.1000-3428.2006.20.034

• Software Technology and Database • Previous Articles     Next Articles

Application of Signature Sequence Analysis in Text Classification

LU Yansheng, CUI Dexuan, ZOU Lei   

  1. (College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-20 Published:2006-10-20

特征序列分析方法在文本分类中的应用

卢炎生,崔得暄,邹 磊   

  1. (华中科技大学计算机科学与技术学院,武汉 430074)

Abstract: This paper applies a method of DNA sequence analysis in computational biology to text classification, and puts forward a text classifying method called SSAM by analyzing the signature sequences generated by a document collection. The experimental result on Reuters21578 and the comparison with several other text classifying methods prove SSAM has a better performance than Naive Bayes, and has a rapid classifying speed.

Key words: Text classification, Vector space model, Signature sequence

摘要: 把计算生物学中DNA序列分析的一种方法应用到文本分类中,通过分析文档集所产生的可描述类别内在特征的特征序列,给出了一种文本分类方法SSAM,并在Reuters21578数据集上和其它几种常见分类方法的分类效果进行了比较,实验结果显示SSAM的分类效果优于传统的贝叶斯方法,而且具有较快的分类速度。

关键词: 文本分类, 向量空间模型, 特征序列