计算机工程 ›› 2012, Vol. 38 ›› Issue (23): 162-165.doi: 10.3969/j.issn.1000-3428.2012.23.040

• 人工智能及识别技术 • 上一篇    下一篇

基于最小闭包球的中文博客分类

傅向华,郭武彪,刘 国,王志强   

  1. (深圳大学计算机与软件学院,广东 深圳 518060)
  • 收稿日期:2011-12-12 出版日期:2012-12-05 发布日期:2012-12-03
  • 作者简介:傅向华(1977-),男,副教授、博士、CCF会员,主研方向:数据挖掘,信息检索;郭武彪、刘 国,硕士研究生;王志强,教授
  • 基金项目:
    国家自然科学基金资助项目(60903114,60973100);广东省自然科学基金资助项目(7301329);深圳市科技计划基金资助项目(JC201005280463A, JC201105160498A)

Chinese Blog Classification Based on Minimum Enclosing Ball

FU Xiang-hua, GUO Wu-biao, LIU Guo, WANG Zhi-qiang   

  1. (College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China)
  • Received:2011-12-12 Online:2012-12-05 Published:2012-12-03

摘要: 提出一种基于近似最小闭包球原理的中文博客(Blog)话题分类方法。根据近似最小闭包球原理,将支持向量机的优化求解转换为近似最小闭包球求解,使得只需选择大规模数据集的一个核心子集参与分类器的训练过程,以提高Blog话题分类中大规模训练集的处理能力。在较大规模的Blog数据集上进行中文Blog特征选择及话题分类实验。实验结果表明,该方法不仅准确率可达到支持向量机同等的效果,且可减少训练时间,获得较好的Blog话题分类效果。

关键词: 博客分类, 近似最小闭包球, 支持向量机, 核心向量机, 数据挖掘, 新兴媒体

Abstract: A Chinese Blog topic classification method based on approximate minimum enclosing ball is proposed. By transforming the optimization problem of the Support Vector Machine(SVM) to the optimization problem of approximate minimum enclosing ball equivalently, the Blog topic classifier can be trained quickly by only selecting a core subset of the original large scale dataset. The feature selection experiments and topic classification experiments are executed on large scale Blog dataset. Experimental results show that the method can provide good classification precise and quick run-time speed.

Key words: Blog classification, approximate minimum enclosing ball, Support Vector Machine(SVM), core vector machine, data minin, new media

中图分类号: