摘要: 针对K最近邻(KNN)方法分类准确率高但分类效率较低的特点,提出基于后验概率制导的贝叶斯K最近邻(B-KNN)方法。利用测试文本的后验概率信息对训练集多路静态搜索树进行剪枝,在被压缩的候选类型空间内查找样本的K个最近邻,从而在保证分类准确率的同时提高KNN方法的效率。实验结果表明,与KNN相比,B-KNN的性能有较大提升,更适用于具有较深层次类型空间的文本分类应用。
关键词:
文本分类,
后验概率,
贝叶斯分类器,
K最近邻方法,
贝叶斯K最近邻方法
Abstract: Considering K Nearest Neighbor(KNN) method has high accuracy but poor efficiency, this paper proposes a text categorization method based on the guidance of posterior probability named B-KNN. By using the posterior probabilities collected from the training text, B-KNN prunes the multi-branch-static-searching tree of the training dataset and reduces the candidate class set where K nearest neighbors can be found so that the efficiency of KNN method can be improved while preserving its classification accuracy. Experimental results show that B-KNN method remarkably outperforms KNN method, and it is more suitable for classification tasks with deep hierarchy categorization space.
Key words:
text categorization,
posterior probability,
Bayesian classifier,
K Nearest Neighbor(KNN) method,
B-KNN method
中图分类号:
周红鹃, 祖永亮. 基于后验概率制导的B-KNN文本分类方法[J]. 计算机工程, 2011, 37(21): 114-116.
ZHOU Gong-Juan, JIE Yong-Liang. B-KNN Text Categorization Method Based on Posterior Probability Guidance[J]. Computer Engineering, 2011, 37(21): 114-116.