摘要: 在近邻元分析(NCA)算法的基础上,提出K近邻元分析分类算法K-NCA。利用NCA算法完成对训练样本集的距离测度学习和降维,定义类偏斜因子,引入K近邻思想,得到测试样本的类条件概率估计,并通过该概率进行类别判定,实现文本分类器功能。实验结果表明,K-NCA算法的分类效果较好。
关键词:
近邻元分析,
距离测度学习,
降维,
K近邻,
文本分类
Abstract: This paper proposes a novel algorithm named K-NCA based on Neighborhood Component Analysis(NCA). It uses NCA to learn a Mahalanobis distance measure and reduces the dimension of the input dataset. The algorithm defines a class imbalance factor and introduces K Nearest Neighbor(KNN) to compute the test sample’s class-conditional probability estimation. The sample’s class label is decided by this probability. A text classifier is designed to accomplish the algorithm. Experimental results show that K-NCA algorithm can improve the accuracy of text classification.
Key words:
Neighborhood Component Analysis(NCA),
distance metric learning,
dimension reduction,
K Nearest Neighbor(KNN),
text classification
中图分类号:
刘丛山, 李祥宝, 杨煜普. 一种基于近邻元分析的文本分类算法[J]. 计算机工程, 2012, 38(15): 139-141.
LIU Cong-Shan, LI Xiang-Bao, YANG Yu-Pu. Text Classification Algorithm Based on Neighborhood Component Analysis[J]. Computer Engineering, 2012, 38(15): 139-141.