作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (2): 190-192. doi: 10.3969/j.issn.1000-3428.2008.02.063

• 人工智能及识别技术 • 上一篇    下一篇

多主题文本分类的实现算法

秦玉平1,2,王秀坤1,艾 青2,刘卫江3   

  1. (1. 大连理工大学电子与信息工程学院,大连 116024;2. 渤海大学信息科学与工程学院,锦州121000;3. 东南大学计算机科学与技术学科博士后流动站,南京 210096)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-01-20 发布日期:2008-01-20

Algorithm for Multi-subject Text Classification

QIN Yu-ping1,2, WANG Xiu-kun1, AI Qing2 , LIU Wei-jiang3   

  1. (1. School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116024;2. College of Information Science and Technology, Bohai University, Jinzhou 121000;3. Post Doctoral Station for Computer Science and Technology, Southeast University, Nanjing 210096)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-20 Published:2008-01-20

摘要: 针对一个文本具有多主题属性,提出一种基于模糊支持向量机的多主题文本分类算法。用1-a-r方法训练子分类器,计算待分类文本到每个超平面的距离,依据距离得到隶属度向量,根据隶属度向量判定该文本所属的主题。实验结果表明,该算法在保证单主题文本分类精度的前提下,实现了多主题文本分类,并且有较好的准确率、召回率和F1值。

关键词: 支持向量机, 隶属度向量, 召回率, 准确率, F1值

Abstract: For attribute of multi-subject of a text, a multi-subject text categortization algorithm based on fuzzy support vector machines is proposed. It uses “1-a-r” method to train sub-classifier. For the text to be classified, computing the distances from the text to every hyperplane, according to the distances, the membership vector is gotten, at last label the subjects that the text belongs to according to the membership vector. The experimental results show that the algorithm ensures classification precision of single-subject text, and realizes multi-subjects text classification, and has higher performance on precision recall and F1 value.

Key words: support vector machines, membership vector, recall, precision, F1 value

中图分类号: