作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (15): 32-34.

• 软件技术与数据库 • 上一篇    下一篇

一种潜在文档相似模型

贾西平1,刘海珠2   

  1. (1. 广东技术师范学院计算机科学学院,广州 510665;2. 华南师范大学增城学院,广州 511363)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-08-05 发布日期:2009-08-05

Latent Document Similarity Model

JIA Xi-ping1, LIU Hai-zhu2   

  1. (1. School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665; 2. Zengcheng College, South China Normal University, Guangzhou 511363)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-08-05 Published:2009-08-05

摘要: 提出一种潜在文档相似模型(LDSM),把每对文档看作一个二分图,把文档的潜在主题看作二分图的顶点,用主题间的加权相似度为相应边赋权值,并用二分图的最佳匹配表示文档的相似度。实验结果表明,LDSM的平均查准率和平均查全率都优于用TextTiling和二分图最佳匹配方法构建的文档相似模型。

关键词: 主题, 文档相似度, 文档检索, 信息检索

Abstract: This paper proposes a Latent Document Similarity Model(LDSM). It denotes each document pair as a bipartite graph, where each node is a latent topic, and each edge is weighted with the similarity between the corresponding topics, and it represents the document similarity as the optimal matching of the bipartite graph. Experimental results show that LDSM outperforms the document similarity model based on TextTiling and the optimal matching of bipartite graph at both average precision and average recall.

Key words: topic, document similarity, document retrieval, information retrieval

中图分类号: