作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (13): 26-27,5. doi: 10.3969/j.issn.1000-3428.2009.13.009

• 软件技术与数据库 • 上一篇    下一篇

基于非负矩阵分解的中文文本主题分类

张 磊,冯晓森,项学智   

  1. (哈尔滨工程大学信息与通信工程学院,哈尔滨 150001)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-07-05 发布日期:2009-07-05

Topic Classification of Chinese Document Based on NMF

ZHANG Lei, FENG Xiao-sen, XIANG Xue-zhi   

  1. (Information and Communication Engineering College, Harbin Engineering University, Harbin 150001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-07-05 Published:2009-07-05

摘要: 提出基于非负矩阵分解(NMF)的中文文本主题分类方法,应用NMF算法分解词-文本矩阵获取词之间的相关性,有效地解决同义词、多义词的影响。实验结果表明,与基于奇异值分解的潜在语义索引方法相比,该方法计算速度快、占用存储空间较少。在潜在语义数据降低较大的情况下,NMF方法具有更好的分类精度。

关键词: 主题分类, 非负矩阵分解, 潜在语义索引

Abstract: This paper presents a method based on Non-negative Matrix Factorization(NMF) for Chinese document topic classification. According to NMF, the term-document matrix is decomposed to reveal the relationship between terms. This method solves the problem of synonym and polysemy effectively. Compared with Latent Semantic Indexing(LSI) based on Singular Value Decomposition(SVD), experimental results show that this method has faster computing speed and less memory occupancy. It can improve classification precision when the number of latent semantic index is reduced pronouncedly.

Key words: topic classification, Non-negative Matrix Factorization(NMF), Latent Semantic Indexing(LSI)

中图分类号: