Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2006, Vol. 32 ›› Issue (17): 63-65. doi: 10.3969/j.issn.1000-3428.2006.17.022

• Special Paper • Previous Articles     Next Articles

Research on Improvement of Bayesian Text Classifier

LU Mingyu   

  1. (College of Computer Science and Technology, Dalian Maritime University, Dalian 116026)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-09-05 Published:2006-09-05

Bayes文本分类器的改进方法研究

鲁明羽   

  1. (大连海事大学计算机科学与技术学院,大连 116026)

Abstract: Bayesian classification model is common, powerful for text categorization task. It is based on probability and is of religious theoretic basis. The paper makes analysis to the simple and common naive Bayesian categorization model, and presents an approach to improve performance of Bayesian classification model using weight adjustment and an approach to make non-tutor Bayesian categorization using EM algorithm when lacking mass training texts, and discusses how to fix the framework of Bayesian network using heuristic methods so as to make text classification in real circumstance.

Key words: Text categorization, Naï, ve Bayesian categorization model, Weight adjustment, EM algorithm

摘要: 在文本分类领域,Bayes分类器是一种常用且效果较好的、基于概率的分类器,具有较严密的理论基础。该文对朴素Bayes文本分类器进行了分析,提出了利用权值调整机制改善其分类性能的方法,以及在缺乏大量训练文本的情况下,利用EM算法进行非监督Bayes分类的方法,并讨论了如何运用启发式方法确定Bayes网络结构,在更贴近真实环境的情况下进行文本分类。

关键词: 文本分类, 朴素Bayes分类模型, 权值调整, EM算法

CLC Number: