作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (15): 35-37,4. doi: 10.3969/j.issn.1000-3428.2009.15.012

• 软件技术与数据库 • 上一篇    下一篇

一种大规模数据的快速潜在语义索引

卫 威1,王建民2   

  1. (1. 清华大学计算机科学与技术系,北京 100084;2. 清华大学软件学院,北京 100084)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-08-05 发布日期:2009-08-05

Fast Latent Semantic Indexing on Large-scale Dataset

WEI Wei1, WANG Jian-min2   

  1. (1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084; 2. School of Software, Tsinghua University, Beijing 100084)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-08-05 Published:2009-08-05

摘要: 潜在语义索引(LSI)已应用到现代信息检索的多个领域,但矩阵奇异值分解的高复杂度阻碍了该技术在大规模数据上的应用。提出一种大规模数据的快速LSI方法。给出一个降维问题的统一框架,LSI作为一种特征提取算法,可以在这个框架下转化为一个特征选择问题。利用该技术在最大程度保持LSI降维效果的同时,简化LSI的计算,使其能够应用于大规模数据。

关键词: 潜在语义索引, 降维, 特征选择, 特征提取

Abstract: Latent Semantic Indexing(LSI) has been successfully applied to various fields in modern information retrieval. However, the high computational complexity of Singular Value Decomposition(SVD) makes it improbable on the application of large-scale dataset. This paper proposes a fast LSI approach to solve this problem. It gives a unified framework of dimension reduction problem. As a feature extraction method, LSI can be transformed into a feature selection method within this framework. This new strategy can simplify significantly the computation of LSI.

Key words: Latent Semantic Indexing(LSI), dimension reduction, feature selection, feature extraction

中图分类号: