作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (01): 78-80. doi: 10.3969/j.issn.1000-3428.2011.01.027

• 软件技术与数据库 • 上一篇    下一篇

基于奇异值分解的英文文本检索算法

高仕龙   

  1. (乐山师范学院数学系,四川 乐山 614000)
  • 出版日期:2011-01-05 发布日期:2010-12-31
  • 作者简介:高仕龙(1975-),男,副教授、博士研究生,主研方向:弱信号检测,数据挖掘
  • 基金资助:
    四川省教育厅基金资助项目“基于混沌系统的线性调频信号检测与参数估计”(09ZB026)

English Texts Retrieval Algorithm Based on SVD

GAO Shi-long   

  1. (Department of Mathematics, Leshan Normal University, Leshan 614000, China)
  • Online:2011-01-05 Published:2010-12-31

摘要: 提出一种英文文本检索算法,从文本中提取关键词项,根据转移概率计算出关键词项的状态矩阵,并通过奇异值分解,提取第一奇异值向量作为复特征向量,利用向量间的余弦相似度作为文本检索的相似度度量。实验结果表明,该算法在检索准确率和运算效率上都优于传统的LSA算法。

关键词: 文本检索, 转移概率, 奇异值分解, 状态矩阵

Abstract: A new retrieval algorithm for English texts is proposed. Keywords are extracted from the English texts. The state matrix of keywords is calculated based on transition probabilities matrix and the first singular value vector is got through Singular Value Decomposition(SVD) as the complex feature vectors. The cosine similarity of texts is used to measure the similarity between the query and documents. Experimental results indicate that this algorithm gets the advantage over the traditional LSA algorithm in precision and computational efficiency.

Key words: texts retrieval, transition probability, Singular Value Decomposition(SVD), state matrix

中图分类号: