Fast Latent Semantic Indexing on Large-scale Dataset

doi:10.3969/j.issn.1000-3428.2009.15.012

Computer Engineering ›› 2009, Vol. 35 ›› Issue (15): 35-37,4. doi: 10.3969/j.issn.1000-3428.2009.15.012

• Software Technology and Database • Previous Articles Next Articles

Fast Latent Semantic Indexing on Large-scale Dataset

WEI Wei1, WANG Jian-min2

(1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084; 2. School of Software, Tsinghua University, Beijing 100084)

Received:1900-01-01 Revised:1900-01-01 Online:2009-08-05 Published:2009-08-05

一种大规模数据的快速潜在语义索引

卫威1，王建民2

(1. 清华大学计算机科学与技术系，北京 100084；2. 清华大学软件学院，北京 100084)

Abstract

Abstract: Latent Semantic Indexing(LSI) has been successfully applied to various fields in modern information retrieval. However, the high computational complexity of Singular Value Decomposition(SVD) makes it improbable on the application of large-scale dataset. This paper proposes a fast LSI approach to solve this problem. It gives a unified framework of dimension reduction problem. As a feature extraction method, LSI can be transformed into a feature selection method within this framework. This new strategy can simplify significantly the computation of LSI.

Key words: Latent Semantic Indexing(LSI), dimension reduction, feature selection, feature extraction

摘要： 潜在语义索引(LSI)已应用到现代信息检索的多个领域，但矩阵奇异值分解的高复杂度阻碍了该技术在大规模数据上的应用。提出一种大规模数据的快速LSI方法。给出一个降维问题的统一框架，LSI作为一种特征提取算法，可以在这个框架下转化为一个特征选择问题。利用该技术在最大程度保持LSI降维效果的同时，简化LSI的计算，使其能够应用于大规模数据。

关键词: 潜在语义索引, 降维, 特征选择, 特征提取

CLC Number:

TP311

WEI Wei; WANG Jian-min. Fast Latent Semantic Indexing on Large-scale Dataset[J]. Computer Engineering, 2009, 35(15): 35-37,4.

卫威;王建民. 一种大规模数据的快速潜在语义索引[J]. 计算机工程, 2009, 35(15): 35-37,4.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2009.15.012

http://www.ecice06.com/EN/Y2009/V35/I15/35

[1]	Junhang CHEN, Zuyuan YANG, Mingyang LIU, Lingjiang LI. Generalized Separable Nonnegative Matrix Factorization Algorithm Based on Orthogonal Constraints [J]. Computer Engineering, 2023, 49(8): 46-53.
[2]	Na MA, Tingxin WEN, Xu JIA, Xiaohui LI. Adaptive Vehicle Face Re-identification Model Under Complex Illumination Conditions [J]. Computer Engineering, 2023, 49(8): 275-282, 290.
[3]	Xuan YANG, Jianmin MA, Manjun ZHAO. Feature Selection of High-Dimensional Time-Series Data Based on Neighborhood Mutual Information [J]. Computer Engineering, 2023, 49(7): 135-142.
[4]	DAI Haolei, HUANG Yonghui, ZHOU Guoxu. Clustering Analysis Based on Hyper-graph Regularized Non-Negative Tensor Train Decomposition [J]. Computer Engineering, 2023, 49(6): 81-89.
[5]	SONG Yukai, XIE Jiang. Lightweight Speech Emotion Recognition Model Based on Multi-Task Learning [J]. Computer Engineering, 2023, 49(5): 122-128.
[6]	GUAN Ripeng, KUANG Liqun, JIAO Shichao, XIONG Fengguang, HAN Xie. Retrieval Method of 3D Models Driven by Multi-modal Feature Fusion and Word Embedding [J]. Computer Engineering, 2023, 49(4): 101-107,113.
[7]	LI Peiyu, ZHANG Yali. Face Image Super-Resolution Reconstruction Based on Improved SRGAN Model [J]. Computer Engineering, 2023, 49(4): 199-205.
[8]	GENG Lei, FU Hongliang, TAO Huawei, LU Yuan, GUO Xinying, ZHAO Li. Speech Emotion Recognition Based on Dynamic Convolution Recurrent Neural Network [J]. Computer Engineering, 2023, 49(4): 125-130,137.
[9]	HE Yue, CHEN Guangsheng, JING Weipeng, XU Zekun. Remote Sensing Image Retrieval Based on Deep Multi-Similarity Hashing Method [J]. Computer Engineering, 2023, 49(2): 206-212.
[10]	GAO Qingji, LI Tianhao, XING Zhiwei, LIU Peipei. Point Cloud Semantic Segmentation Method Based on Block Feature Fusion [J]. Computer Engineering, 2022, 48(9): 37-44,54.
[11]	YAN Jing, ZHANG Xueying, LI Fenglian, CHEN Guijun, HUANG Lixia. Regression Prediction Model Combining Stack Supervised AE and Variable Weighted ELM [J]. Computer Engineering, 2022, 48(8): 62-69,76.
[12]	LI Chen, HOU Jin, LI Jinbiao, CHEN Zirui. Infrared and Visible Image Fusion Method Based on Attention and Residual Concatenation [J]. Computer Engineering, 2022, 48(7): 234-240.
[13]	LIU Li, ZHANG Desheng, XIAO Yanting. Fuzzy Weighted k-Nearest Centroid Neighbor Algorithm Based on Membership [J]. Computer Engineering, 2022, 48(7): 122-129.
[14]	CUI Yunxuan, LIU Guihua, YU Dongying, GUO Zhongyuan, ZHANG Wenkai. Lidar-Mono-Inertial SLAM System with Fusion of Point-Line Features [J]. Computer Engineering, 2022, 48(7): 254-263.
[15]	LI Kequan, CHEN Yan, LIU Jiachen, MU Xiangwei. Survey of Deep Learning-Based Object Detection Algorithms [J]. Computer Engineering, 2022, 48(7): 1-12.

Please choose a citation manager

Content to export

Fast Latent Semantic Indexing on Large-scale Dataset

一种大规模数据的快速潜在语义索引

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Fast Latent Semantic Indexing on Large-scale Dataset

一种大规模数据的快速潜在语义索引

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments