作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (1): 61-63. doi: 10.3969/j.issn.1000-3428.2008.01.020

• 软件技术与数据库 • 上一篇    下一篇

基于分类器性能评价的Bagging文本分类算法

赵 苏1,李 秀2,刘文煌1   

  1. (1. 清华大学深圳研究生院,深圳 518055;2. 清华大学自动化系,北京 100084)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-01-05 发布日期:2008-01-05

Bagging Text Classification Algorithm Based on Classifier Performance Evaluation

ZHAO Su1, LI Xiu2, LIU Wen-huang1   

  1. (1. Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055; 2. Department of Automation, Tsinghua University, Beijing 100084)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-05 Published:2008-01-05

摘要: 提出一个文本分类器性能评价模型,对文本分类结果的可信度进行了估计,给出计算可信度的公式。将每一个子分类器的可信度指标用于Bagging集成学习算法,得到了改进的基于子分类器性能评价的Bagging算法(PBagging)。应用支持向量机作为子分类器基本模型,对日本共同社大样本新闻集进行分类。实验表明,与Bagging算法相比,PBagging算法分类准确率有了明显提高。

关键词: 文本分类, 分类器性能, 评价模型, Bagging算法

Abstract: This paper presents an evaluation model for the text classifier. The reliability of classifying result of a classifier is computed according to its learning result and naive Bayesian. Based on the performance evaluation model, Performance Bagging(PBagging), an improved text classification algorithm is proposed. In the algorithm, the reliability is served as the weight of subclassifier’s result when using Bagging, an ensemble learning method. Using SVM as the sub model, it applies the PBagging algorithm to classify news corpus in kyodo news agent, the result shows that PBagging performs better than Bagging with more accuracy.

Key words: text classification, classifier performance, evaluation model, Bagging algorithm

中图分类号: