作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于增量模式的文档层次分类研究

古 平,罗志恒,欧阳源遊   

  1. (重庆大学计算机学院,重庆 400044)
  • 收稿日期:2012-12-17 出版日期:2014-01-15 发布日期:2014-01-13
  • 作者简介:古 平(1976-),男,副教授、博士,主研方向:机器学习,数据挖掘;罗志恒、欧阳源遊,硕士研究生
  • 基金资助:
    重庆市科委自然科学基金资助项目(CSTC2012jjA40002)

Research of Document Hierarchical Classification Based on Incremental Mode

GU Ping, LUO Zhi-heng, OUYANG Yuan-you   

  1. (College of Computer Science, Chongqing University, Chongqing 400044, China)
  • Received:2012-12-17 Online:2014-01-15 Published:2014-01-13

摘要: 在文档层次分类中,分类器的自适应调整和阻滞会影响层次分类的精度。为解决上述问题,提出一种基于类别上下文特征的层次分类模型及增量学习算法。根据分类体系,渐进地为每个判决节点建立并维护一个类别相关的上下文特征集,依据文档在上下文特征集中的支持度,找到最可能的层次分类路径和类别。考虑到增量学习的特殊性,将语义相似度引入到路径置信度计算中,以缓解上下文特征集不完备的问题。实验结果表明,相对层次Bayes、层次SVM模型,该算法不仅具有自适应的特性,而且在测试文档集中能提升近8%的分类精度。

关键词: 增量学习, 语义概念, 层次分类, 自适应, 置信度

Abstract: Blocking and evolvement of classifiers are two key issues which affect the performance of hierarchical classification. To solve these problems, this paper introduces a new algorithm that incrementally learns a hierarchical classification tree by extracting appropriate terms from documents for each node of the taxonomy, and classification is obtained by evaluating the confidence of document on each path from root to the leaf category. Considering the characteristic of incremental learning, it incorporates semantic similarity into the confidence estimation of classification path with aim to alleviate the problem of features incompleteness. Experimental results show that compared with hierarchical Bayes and SVM, the algorithm not only has the characteristics of adaptability, but also can improve the classification accuracy by about 8%.

Key words: incremental learning, semantic concept, hierarchical classification, self-adaptive, degree of confidence

中图分类号: