作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (15): 190-192. doi: 10.3969/j.issn.1000-3428.2007.15.067

• 人工智能及识别技术 • 上一篇    下一篇

基于自助平均的朴素贝叶斯文本分类器

白莉媛1,黄 晖2,刘素华1,阎秋玲1   

  1. (1. 河南工业大学信息科学与工程学院,郑州 450052;2. 河南工业大学理学院,郑州 450052)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-08-05 发布日期:2007-08-05

Naive Bayes Classifier Based on Bootstrap Average

BAI Li-yuan1, HUANG Hui2, LIU Su-hua1, YAN Qiu-ling1   

  1. (1. School of Information Science and Technology, Henan University of Technology, Zhengzhou 450052; 2. School of Science, Henan University of Technology, Zhengzhou 450052)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-05 Published:2007-08-05

摘要: 针对单词簇上训练朴素贝叶斯文本分类器概率估计偏差较大所导致的分类精度较低问题,在概率分布聚类算法得到的单词簇的基础上,根据单词与簇间互信息建立有序单词子序列,采用有放回随机抽样对序列构造规模相当的样本集,并将估计出的参数的平均值作为训练得到的参数对未知文本进行分类。公共文本实验数据集上的实验结果表明,该文提出的训练方法相对于传统的朴素贝叶斯分类器训练方法能够获得更高的分类精度且过程相对简单。

关键词: 分布聚类, 文本分类, 朴素贝叶斯分类器, 自助平均

Abstract: Aiming at the problem of low classification accuracy caused by poor distribution estimation for training naive Bayes text classifier on word clusters, a word list based on mutual information between word and clusters is made, a sample set with the same size with bootstrap method is constructed, and the average of the parameters estimated from it as the last parameter to classify unknown text is used. Experiment results on benchmark text dataset show that the method gain higher classification accuracy compared to naive Bayes classifier.

Key words: distributional clustering, text classification, naive Bayes classifier, bootstrap average

中图分类号: