Abstract:
This paper applies a method of DNA sequence analysis in computational biology to text classification, and puts forward a text classifying method called SSAM by analyzing the signature sequences generated by a document collection. The experimental result on Reuters21578 and the comparison with several other text classifying methods prove SSAM has a better performance than Naive Bayes, and has a rapid classifying speed.
Key words:
Text classification,
Vector space model,
Signature sequence
摘要: 把计算生物学中DNA序列分析的一种方法应用到文本分类中,通过分析文档集所产生的可描述类别内在特征的特征序列,给出了一种文本分类方法SSAM,并在Reuters21578数据集上和其它几种常见分类方法的分类效果进行了比较,实验结果显示SSAM的分类效果优于传统的贝叶斯方法,而且具有较快的分类速度。
关键词:
文本分类,
向量空间模型,
特征序列
LU Yansheng; CUI Dexuan; ZOU Lei. Application of Signature Sequence Analysis in Text Classification[J]. Computer Engineering, 2006, 32(20): 92-94.
卢炎生;崔得暄;邹 磊. 特征序列分析方法在文本分类中的应用[J]. 计算机工程, 2006, 32(20): 92-94.