Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2007, Vol. 33 ›› Issue (08): 165-167. doi: 10.3969/j.issn.1000-3428.2007.08.057

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Ensemble Text Classification Model Supplemented by Strong Rules Learning

LIU Jinhong1, LU Yuliang1, ZHOU Xindong2   

  1. (1. Department of Network Engineering, PLA Electronic Engineering Institute, Hefei 230037; 2. School of Computer, National University of Defense Technology, Changsha 410073)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-04-20 Published:2007-04-20

一种辅以强规则学习的双层文本分类模型

刘金红1,陆余良1,周新栋2   

  1. (1. 解放军电子工程学院网络工程系,合肥 230037;2. 国防科学技术大学计算机学院,长沙 410073)

Abstract: Most Chinese text classification methods are applied to the machine learning technologies, while ignoring the traditional methods based on decision rules. This paper combines them into a whole classifier, taking the rule-based learner as a component classifier, and proposes a new optimized rule induction algorithm for the purpose of automatic generated “strong” decision rules. The experiment result shows that the mixed classifier outperforms the single N-Gram classifying method based on machine learning.

Key words: Text classification, Document index, Classification rules learning

摘要: 随着基于机器学习的文本自动分类方法成为主流分类技术,基于机器学习的文本分类方法往往忽视了对规则分类方法的有效运用。该文将基于规则的分类思想和基于机器学习的分类方法有机地结合起来,把规则判别看作一个分量分类器,提出了一种辅以规则补充的双层文本分类模型和一种优化的分类规则学习算法。根据该方法设计并实现了一个基于规则和N-Gram统计分类相结合的双层分类器,进行了双层分类模型与单独的N-Gram分类模型的实验,结果表明辅以规则补充的双层分类器具有更好的分类性能。

关键词: 文本分类, 文档索引, 分类规则学习

CLC Number: