Abstract:
Most Chinese text classification methods are applied to the machine learning technologies, while ignoring the traditional methods based on decision rules. This paper combines them into a whole classifier, taking the rule-based learner as a component classifier, and proposes a new optimized rule induction algorithm for the purpose of automatic generated “strong” decision rules. The experiment result shows that the mixed classifier outperforms the single N-Gram classifying method based on machine learning.
Key words:
Text classification,
Document index,
Classification rules learning
摘要: 随着基于机器学习的文本自动分类方法成为主流分类技术,基于机器学习的文本分类方法往往忽视了对规则分类方法的有效运用。该文将基于规则的分类思想和基于机器学习的分类方法有机地结合起来,把规则判别看作一个分量分类器,提出了一种辅以规则补充的双层文本分类模型和一种优化的分类规则学习算法。根据该方法设计并实现了一个基于规则和N-Gram统计分类相结合的双层分类器,进行了双层分类模型与单独的N-Gram分类模型的实验,结果表明辅以规则补充的双层分类器具有更好的分类性能。
关键词:
文本分类,
文档索引,
分类规则学习
CLC Number:
LIU Jinhong; LU Yuliang; ZHOU Xindong. Ensemble Text Classification Model Supplemented by Strong Rules Learning[J]. Computer Engineering, 2007, 33(08): 165-167.
刘金红;陆余良;周新栋. 一种辅以强规则学习的双层文本分类模型[J]. 计算机工程, 2007, 33(08): 165-167.