作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (19): 183-184. doi: 10.3969/j.issn.1000-3428.2006.19.067

• 人工智能及识别技术 • 上一篇    下一篇

基于交叉覆盖算法的中文文本分类

刘政怡1,龚建成2,吴建国1   

  1. (1. 安徽大学计算智能与信号处理教育部重点实验室,合肥 230039;2. 安徽工程科技学院机械工程系,芜湖 241000)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-10-05 发布日期:2006-10-05

Chinese Text Categorization Based on Alternative Covering Algorithm

LIU Zhengyi1, GONG Jiancheng 2, WU Jianguo1   

  1. (1. Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University, Hefei 230039; 2. Department of Mechanical Engineering, Anhui University of Technology and Science, Wuhu 241000)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-05 Published:2006-10-05

摘要: 基于向量空间模型的文本分类过程中遇到的最大问题就是以词为特征项的向量维数太大,需要进行特征选取,而交叉覆盖算法的输入集是n维欧式空间的点集,可以忽略维数的大小,从而最大程度上精确地表示文本,然后再进行分类,能够大大提高正确率。将交叉覆盖算法作为一种分类算法来进行中文文本分类,取得了不错的效果,在封闭测试中的准确率达到98.32%。

关键词: 文本分类, 交叉覆盖算法, 中文信息处理

Abstract: During text categorization based on VSM, too large vector dimension becomes the most important problem, this vector regard word as feature selection vector. As the input of alternative covering algorithm is point sets distributed in the n-dimension space, it can ignore the size of dimension, express text in the most precision and improve precision of text categorization. This paper introduces alternative covering algorithm to categorize Chinese texts, good effects are obtained and exactness reaches 98.32% in close tests.

Key words: Text categorization, Alternative covering algorithm, Chinese information processing