摘要: 潜在语义索引(LSI)已应用到现代信息检索的多个领域,但矩阵奇异值分解的高复杂度阻碍了该技术在大规模数据上的应用。提出一种大规模数据的快速LSI方法。给出一个降维问题的统一框架,LSI作为一种特征提取算法,可以在这个框架下转化为一个特征选择问题。利用该技术在最大程度保持LSI降维效果的同时,简化LSI的计算,使其能够应用于大规模数据。
关键词:
潜在语义索引,
降维,
特征选择,
特征提取
Abstract: Latent Semantic Indexing(LSI) has been successfully applied to various fields in modern information retrieval. However, the high computational complexity of Singular Value Decomposition(SVD) makes it improbable on the application of large-scale dataset. This paper proposes a fast LSI approach to solve this problem. It gives a unified framework of dimension reduction problem. As a feature extraction method, LSI can be transformed into a feature selection method within this framework. This new strategy can simplify significantly the computation of LSI.
Key words:
Latent Semantic Indexing(LSI),
dimension reduction,
feature selection,
feature extraction
中图分类号:
卫 威;王建民. 一种大规模数据的快速潜在语义索引[J]. 计算机工程, 2009, 35(15): 35-37,4.
WEI Wei; WANG Jian-min. Fast Latent Semantic Indexing on Large-scale Dataset[J]. Computer Engineering, 2009, 35(15): 35-37,4.