作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (2): 175-176,222.

• 人工智能及识别技术 • 上一篇    下一篇

基于 Lee 模型的文本分类

靳小波,夏清国   

  1. 西北工业大学计算机学院,西安 710072
  • 出版日期:2006-01-20 发布日期:2006-01-20

Text Categorization with Lee Model

JIN Xiaobo, XIA Qingguo   

  1. School of Computer, Northwestern Polytechnical University, Xi’an 710072
  • Online:2006-01-20 Published:2006-01-20

摘要: David Lee 从心理学的角度提出Lee 模型并将其用于文本分类。该文将Lee 模型引入Na?ve Bayes 和TFIDF 中,比较了影响度和TF-IDF 两种不同的文档表示方法对分类精度的影响,并对Lee 模型的不同因素对算法的影响效果作了分析。结果表明影响度的文档表示方法比TF-IDF 更好一些,启发式的部分读取策略能以较小的时间代价极大地改善分类算法的精度。

关键词: 文本分类;Lee 模型;朴素贝叶斯;TFIDF

Abstract: David Lee came up a model with a psychologically method considering text categorization. This paper introduces Lee’s model in Na?ve Bayes and TFIDF, compares two different vector representation-influence and TFIDF which sway the classification precision and analyzes two factors which effect the algorithm differently in the model. In the end, experiments show that heuristic method and influence representation can improve Na?ve Bayes greatly at much lower time cost.

Key words: Text categorization; Lee’s model; Na?ve Bayes; TFIDF