摘要: 根据文本分类通常包含多异类数据源的特点,提出了多核SVM学习算法。该算法将分类核矩阵的二次组合重新表述成半无限规划,并说明其可以通过重复利用SVM来实现有效求解。实验结果表明,提出的算法可以用于数百个核的结合或者是数十万个样本的结合,对于多异类数据源的文本分类具有较高的查全率和查准率。
关键词:
文本分类,
SVM,
多核学习
Abstract: According to the feature of text classification which often involves multiple, heterogeneous data sources, this paper puts forward the algorithm of multiple kernel learning. It considers that conic combinations of kernel matrices for classification leads to a convex quadratically constraint quadratic program, and it can be efficiently solved by recycling the standard SVM implementations. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and it has higher recall rate and higher precision rate for classification of text email with multiple, heterogeneous data sources.
Key words:
Text classification,
SVM,
Multiple kernel learning
中图分类号:
陈莲娜;姚伏天. 用于文本分类的多核SVM算法研究[J]. 计算机工程, 2007, 33(09): 196-198.
CHEN Lianna; YAO Futian. Algorithm Research on Multiple Kernel Learning SVM
for Text Classification
[J]. Computer Engineering, 2007, 33(09): 196-198.