作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (8): 47-49. doi: 10.3969/j.issn.1000-3428.2011.08.016

• 软件技术与数据库 • 上一篇    下一篇

基于归一化向量的文本分类算法

钟 将,孙启干,李 静   

  1. (重庆大学计算机学院,重庆 400044)
  • 出版日期:2011-04-20 发布日期:2012-10-31
  • 作者简介:钟 将(1974-),男,副教授、博士,主研方向:文本分析,数据挖掘,知识管理;孙启干、李 静,硕士研究生
  • 基金资助:
    国家科技支撑计划基金资助项目(2008BAH37B04);重庆市自然科学基金资助重点项目(CSTC2008BB2195)

Text Classification Algorithm Based on Normalized Vector

ZHONG Jiang, SUN Qi-gan, LI Jing   

  1. (College of Computer Science, Chongqing University, Chongqing 400044)
  • Online:2011-04-20 Published:2012-10-31

摘要: 基于归一化思想和矩阵投影运算提出一种文本分类算法。该算法综合考虑单个类别内的文档频率和词频,用于进行矩阵投影运算。将训练样例中表示文本特征的三维空间投影到二维空间上,得到归一化的特征向量,可有效地达到降低特征空间维数、提高分类效率和精度的目的。与kNN算法的对比实验表明,该算法在时间性能和精度上都有较大提高。

关键词: 文本分类, 矩阵投影, 向量空间模型, 归一化向量

Abstract: This paper proposes a high-performance algorithm for text classification based on normalized vector and matrix projection. The main idea of this algorithm is that the three-dimensional feature space of training samples is projected onto the two-dimensional feature space, and the normalized feature vector is obtained. When matrix projection is operated, the calculation of feature weight concerns not only document frequency from a single category but also term frequency from a single category, achieving the aim of decreasing dimension of features and improving classification efficiency and accuracy. Compared with the kNN algorithm, the result shows that the proposed algorithm greatly improves performance and accuracy of text classification.

Key words: text classification, matrix projection, vector space model, Normalized Vector(NV)

中图分类号: