摘要: 基于归一化思想和矩阵投影运算提出一种文本分类算法。该算法综合考虑单个类别内的文档频率和词频,用于进行矩阵投影运算。将训练样例中表示文本特征的三维空间投影到二维空间上,得到归一化的特征向量,可有效地达到降低特征空间维数、提高分类效率和精度的目的。与kNN算法的对比实验表明,该算法在时间性能和精度上都有较大提高。
关键词:
文本分类,
矩阵投影,
向量空间模型,
归一化向量
Abstract: This paper proposes a high-performance algorithm for text classification based on normalized vector and matrix projection. The main idea of this algorithm is that the three-dimensional feature space of training samples is projected onto the two-dimensional feature space, and the normalized feature vector is obtained. When matrix projection is operated, the calculation of feature weight concerns not only document frequency from a single category but also term frequency from a single category, achieving the aim of decreasing dimension of features and improving classification efficiency and accuracy. Compared with the kNN algorithm, the result shows that the proposed algorithm greatly improves performance and accuracy of text classification.
Key words:
text classification,
matrix projection,
vector space model,
Normalized Vector(NV)
中图分类号:
钟将, 孙启干, 李静. 基于归一化向量的文本分类算法[J]. 计算机工程, 2011, 37(8): 47-49.
ZHONG Jiang, SUN Qi-An, LI Jing. Text Classification Algorithm Based on Normalized Vector[J]. Computer Engineering, 2011, 37(8): 47-49.