Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2012, Vol. 38 ›› Issue (15): 139-141. doi: 10.3969/j.issn.1000-3428.2012.15.038

• Networks and Communications • Previous Articles     Next Articles

Text Classification Algorithm Based on Neighborhood Component Analysis

LIU Cong-shan, LI Xiang-bao, YANG Yu-pu   

  1. (Key Laboratory of System Control and Information Processing, Ministry of Education, Department of Automation, Shanghai Jiaotong University, Shanghai 200240, China)
  • Received:2011-09-29 Online:2012-08-05 Published:2012-08-05

一种基于近邻元分析的文本分类算法

刘丛山,李祥宝,杨煜普   

  1. (上海交通大学自动化系系统控制与信息处理教育部重点实验室,上海 200240)
  • 作者简介:刘丛山(1986-),男,硕士研究生,主研方向:机器学习;李祥宝,博士研究生;杨煜普,教授、博士
  • 基金资助:
    国家“863”计划基金资助项目“云制造服务平台关键技术”(2011AA040605)

Abstract: This paper proposes a novel algorithm named K-NCA based on Neighborhood Component Analysis(NCA). It uses NCA to learn a Mahalanobis distance measure and reduces the dimension of the input dataset. The algorithm defines a class imbalance factor and introduces K Nearest Neighbor(KNN) to compute the test sample’s class-conditional probability estimation. The sample’s class label is decided by this probability. A text classifier is designed to accomplish the algorithm. Experimental results show that K-NCA algorithm can improve the accuracy of text classification.

Key words: Neighborhood Component Analysis(NCA), distance metric learning, dimension reduction, K Nearest Neighbor(KNN), text classification

摘要: 在近邻元分析(NCA)算法的基础上,提出K近邻元分析分类算法K-NCA。利用NCA算法完成对训练样本集的距离测度学习和降维,定义类偏斜因子,引入K近邻思想,得到测试样本的类条件概率估计,并通过该概率进行类别判定,实现文本分类器功能。实验结果表明,K-NCA算法的分类效果较好。

关键词: 近邻元分析, 距离测度学习, 降维, K近邻, 文本分类

CLC Number: