Abstract:
For attribute of multi-subject of a text, a multi-subject text categortization algorithm based on fuzzy support vector machines is proposed. It uses “1-a-r” method to train sub-classifier. For the text to be classified, computing the distances from the text to every hyperplane, according to the distances, the membership vector is gotten, at last label the subjects that the text belongs to according to the membership vector. The experimental results show that the algorithm ensures classification precision of single-subject text, and realizes multi-subjects text classification, and has higher performance on precision recall and F1 value.
Key words:
support vector machines,
membership vector,
recall,
precision,
F1 value
摘要: 针对一个文本具有多主题属性,提出一种基于模糊支持向量机的多主题文本分类算法。用1-a-r方法训练子分类器,计算待分类文本到每个超平面的距离,依据距离得到隶属度向量,根据隶属度向量判定该文本所属的主题。实验结果表明,该算法在保证单主题文本分类精度的前提下,实现了多主题文本分类,并且有较好的准确率、召回率和F1值。
关键词:
支持向量机,
隶属度向量,
召回率,
准确率,
F1值
CLC Number:
QIN Yu-ping; WANG Xiu-kun; AI Qing ; LIU Wei-jiang. Algorithm for Multi-subject Text Classification[J]. Computer Engineering, 2008, 34(2): 190-192.
秦玉平;王秀坤;艾 青;刘卫江. 多主题文本分类的实现算法[J]. 计算机工程, 2008, 34(2): 190-192.