作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (14): 227-229. doi: 10.3969/j.issn.1000-3428.2009.14.079

• 人工智能及识别技术 • 上一篇    下一篇

基于正交分解的文本分类模型

熊忠阳,李智星,张玉芳,江 帆   

  1. (重庆大学计算机学院,重庆 400030)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-07-20 发布日期:2009-07-20

Text Classification Model Based on Orthogonal Decomposition

XIONG Zhong-yang, LI Zhi-xing, ZHANG Yu-fang, JIANG Fan   

  1. (School of Computer, Chongqing University, Chongqing 400030)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-07-20 Published:2009-07-20

摘要: 针对文本分类领域中向量空间模型维数过高和空间扭曲的问题,提出一种基于正交分解的新模型。借用物理学中力的正交分解,将高维的文本向量映射到低维的以类别为坐标轴的空间中,解决了高维的向量和扭曲的空间这2个问题。实验表明,与向量空间模型相比,新模型下分类速度有较大提高,精度也有所增加。

关键词: 文本分类, 正交分解, 向量空间模型

Abstract: In text classification area, Vector Space Model(VSM) is the most widely used model while it has two drawbacks: high dimensions and warped space. This paper presents a new model based on orthogonal decomposition. In this model, higher dimensional vectors of texts are mapped in a lower dimensional space which uses categories as its coordinate axes to solve these two drawbacks. Experiment shows that under the new model, the classification process is speeded up to a considerable degree and the precision is increased.

Key words: text classification, orthogonal decomposition, Vector Space Model(VSM)

中图分类号: