Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2008, Vol. 34 ›› Issue (18): 87-88. doi: 10.3969/j.issn.1000-3428.2008.18.031

• Software Technology and Database • Previous Articles     Next Articles

Research and Implementation of Real-time Text Categorization System

HUANG Xu, ZHU Yan-qin, LUO Xi-zhao   

  1. (School of Computer Science and Technology, Soochow University, Suzhou 215006)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-09-20 Published:2008-09-20

实时文本分类系统的研究与实现

黄 旭,朱艳琴,罗喜召   

  1. (苏州大学计算机科学与技术学院,苏州 215006)

Abstract: This paper analyzes the factors which affect the quality of real-time in text categorization, that is the high time-consuming problem of word segmentation, and the excessively high dimension of character space. Based on the real-time application of Web filter, a real-time text categorization approach is proposed. The approach improves the rate of text categorization by reducing the processing of word segmentation and the dimension of character space. It maintains the effect of text categorization by optimizing the selection of character item, and implements a real-time text classifier based on Bayesian theory. Experimental results show that this approach improves the rate of text categorization effectively, and the precision and recall is maintained at 85 percent and 94 percent.

Key words: information security, content security, text categorization

摘要: 分析文本分类过程中影响实时性的因素,即分词处理高耗时和特征空间维数过高问题。结合网页过滤的实时应用提出一种实时文本分类方法,弱化分词处理过程,降低特征空间维数,以提高分类速度。通过优化特征项选取维持分类效果,基于贝叶斯理论实现实时文本分类系统。实验结果表明,该方法在维持精确率和召回率分别为85%, 94%的情况下,显著提高了分类速度。

关键词: 信息安全, 内容安全, 文本分类

CLC Number: