作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (17): 63-65. doi: 10.3969/j.issn.1000-3428.2006.17.022

• 专题论文 • 上一篇    下一篇

Bayes文本分类器的改进方法研究

鲁明羽   

  1. (大连海事大学计算机科学与技术学院,大连 116026)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-09-05 发布日期:2006-09-05

Research on Improvement of Bayesian Text Classifier

LU Mingyu   

  1. (College of Computer Science and Technology, Dalian Maritime University, Dalian 116026)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-09-05 Published:2006-09-05

摘要: 在文本分类领域,Bayes分类器是一种常用且效果较好的、基于概率的分类器,具有较严密的理论基础。该文对朴素Bayes文本分类器进行了分析,提出了利用权值调整机制改善其分类性能的方法,以及在缺乏大量训练文本的情况下,利用EM算法进行非监督Bayes分类的方法,并讨论了如何运用启发式方法确定Bayes网络结构,在更贴近真实环境的情况下进行文本分类。

关键词: 文本分类, 朴素Bayes分类模型, 权值调整, EM算法

Abstract: Bayesian classification model is common, powerful for text categorization task. It is based on probability and is of religious theoretic basis. The paper makes analysis to the simple and common naive Bayesian categorization model, and presents an approach to improve performance of Bayesian classification model using weight adjustment and an approach to make non-tutor Bayesian categorization using EM algorithm when lacking mass training texts, and discusses how to fix the framework of Bayesian network using heuristic methods so as to make text classification in real circumstance.

Key words: Text categorization, Naï, ve Bayesian categorization model, Weight adjustment, EM algorithm

中图分类号: