作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (21): 114-116. doi: 10.3969/j.issn.1000-3428.2011.21.039

• 人工智能及识别技术 • 上一篇    下一篇

基于后验概率制导的B-KNN文本分类方法

周红鹃,祖永亮   

  1. (合肥工业大学计算机与信息学院,合肥 230009)
  • 收稿日期:2011-04-20 出版日期:2011-11-05 发布日期:2011-11-05
  • 作者简介:周红鹃(1973-),女,讲师,主研方向:数据挖掘,人工智能;祖永亮,硕士
  • 基金资助:
    国家自然科学基金资助项目(60975034)

B-KNN Text Categorization Method Based on Posterior Probability Guidance

ZHOU Hong-juan, ZU Yong-liang   

  1. (School of Computer & Information, Hefei University of Technology, Hefei 230009, China)
  • Received:2011-04-20 Online:2011-11-05 Published:2011-11-05

摘要: 针对K最近邻(KNN)方法分类准确率高但分类效率较低的特点,提出基于后验概率制导的贝叶斯K最近邻(B-KNN)方法。利用测试文本的后验概率信息对训练集多路静态搜索树进行剪枝,在被压缩的候选类型空间内查找样本的K个最近邻,从而在保证分类准确率的同时提高KNN方法的效率。实验结果表明,与KNN相比,B-KNN的性能有较大提升,更适用于具有较深层次类型空间的文本分类应用。

关键词: 文本分类, 后验概率, 贝叶斯分类器, K最近邻方法, 贝叶斯K最近邻方法

Abstract: Considering K Nearest Neighbor(KNN) method has high accuracy but poor efficiency, this paper proposes a text categorization method based on the guidance of posterior probability named B-KNN. By using the posterior probabilities collected from the training text, B-KNN prunes the multi-branch-static-searching tree of the training dataset and reduces the candidate class set where K nearest neighbors can be found so that the efficiency of KNN method can be improved while preserving its classification accuracy. Experimental results show that B-KNN method remarkably outperforms KNN method, and it is more suitable for classification tasks with deep hierarchy categorization space.

Key words: text categorization, posterior probability, Bayesian classifier, K Nearest Neighbor(KNN) method, B-KNN method

中图分类号: