计算机工程 ›› 2013, Vol. 39 ›› Issue (8): 204-207,214.doi: 10.3969/j.issn.1000-3428.2013.08.044

• 人工智能及识别技术 • 上一篇    下一篇

中文文本的意群分类算法

李志彤,易军凯   

  1. (北京化工大学信息科学与技术学院,北京 100029)
  • 收稿日期:2012-04-18 出版日期:2013-08-15 发布日期:2013-08-13
  • 作者简介:李志彤(1985-),女,硕士研究生,主研方向:文本分类,信息安全;易军凯,教授、博士
  • 基金项目:
    国家“863”计划基金资助重点项目(2009AA01Z433)

Sense Group Categorization Algorithm for Chinese Text

LI Zhi-tong, YI Jun-kai   

  1. (College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China)
  • Received:2012-04-18 Online:2013-08-15 Published:2013-08-13

摘要: 目前中文文本分类算法大多利用词语或词语映射为特征项的分类方式,未考虑中文语法语义的特点,导致分类性能较低。为此,提出中文文本的意群分类算法。通过中文依存句法分析结果制定规则提取意群,并作为特征项表示文本,进而采用支持向量机的方法对训练集进行学习,最终构建类别意群库对测试文本进行分类。实验结果表明,与基于词语的分类方法相比,意群分类算法在分类性能上平均提升3个百分点,平均查准率达到97%。

关键词: 文本分类, 意群, 支持向量机, 语义概念, 依存句法, 类别意群库

Abstract: In general, the conventional word-form based Chinese text categorization approach which does not give further consideration on Chinese linguistic feature often has poor performance. A new algorithm of Chinese text categorization based on sense group is proposed. This algorithm extracts sense group by analyzing Chinese dependency parsing results and developing extraction rules. Here uses Support Vector Machine(SVM) to training test documents to build the category sense group library which is used in test. Experimental results display that the method based on sense group reaches accuracy up to 97%, which is 3% higher than the way which is based on words.

Key words: text categorization, sense group, Support Vector Machine(SVM), semantic concept, dependency parsing, category sense group library

中图分类号: