Abstract:
A new retrieval algorithm for English texts is proposed. Keywords are extracted from the English texts. The state matrix of keywords is calculated based on transition probabilities matrix and the first singular value vector is got through Singular Value Decomposition(SVD) as the complex feature vectors. The cosine similarity of texts is used to measure the similarity between the query and documents. Experimental results indicate that this algorithm gets the advantage over the traditional LSA algorithm in precision and computational efficiency.
Key words:
texts retrieval,
transition probability,
Singular Value Decomposition(SVD),
state matrix
摘要: 提出一种英文文本检索算法,从文本中提取关键词项,根据转移概率计算出关键词项的状态矩阵,并通过奇异值分解,提取第一奇异值向量作为复特征向量,利用向量间的余弦相似度作为文本检索的相似度度量。实验结果表明,该算法在检索准确率和运算效率上都优于传统的LSA算法。
关键词:
文本检索,
转移概率,
奇异值分解,
状态矩阵
CLC Number:
GAO Shi-Long. English Texts Retrieval Algorithm Based on SVD[J]. Computer Engineering, 2011, 37(01): 78-80.
高仕龙. 基于奇异值分解的英文文本检索算法[J]. 计算机工程, 2011, 37(01): 78-80.