摘要: 在文本分类领域,Bayes分类器是一种常用且效果较好的、基于概率的分类器,具有较严密的理论基础。该文对朴素Bayes文本分类器进行了分析,提出了利用权值调整机制改善其分类性能的方法,以及在缺乏大量训练文本的情况下,利用EM算法进行非监督Bayes分类的方法,并讨论了如何运用启发式方法确定Bayes网络结构,在更贴近真实环境的情况下进行文本分类。
关键词:
文本分类,
朴素Bayes分类模型,
权值调整,
EM算法
Abstract: Bayesian classification model is common, powerful for text categorization task. It is based on probability and is of religious theoretic basis. The paper makes analysis to the simple and common naive Bayesian categorization model, and presents an approach to improve performance of Bayesian classification model using weight adjustment and an approach to make non-tutor Bayesian categorization using EM algorithm when lacking mass training texts, and discusses how to fix the framework of Bayesian network using heuristic methods so as to make text classification in real circumstance.
Key words:
Text categorization,
Naï,
ve Bayesian categorization model,
Weight adjustment,
EM algorithm
中图分类号:
鲁明羽. Bayes文本分类器的改进方法研究[J]. 计算机工程, 2006, 32(17): 63-65.
LU Mingyu. Research on Improvement of Bayesian Text Classifier[J]. Computer Engineering, 2006, 32(17): 63-65.