Abstract:
Most retrieval systems ignore the anchor text, which is highly relevant to the page content in Web information retrieving. This paper proposes a method to improve the retrieval accuracy. It makes two indices, one for page content and the other for anchor text. A parallel retrieval strategy is utilized for the two indices. Experimental results show that this method is efficient for the special structure document collection.
Key words:
anchor text,
parallel retrieval,
information retrieval
摘要: 进行Web信息检索时,页面中的锚文本与正文存在较大相关性,多数检索系统忽视了锚文本对页面正文的贡献。该文提出一种提高检索精度的方法,为文档集建立一个基于页面正文的索引和一个基于锚文本的索引,对其采取并行检索策略。实验结果表明,该方法可以有效处理特定结构的网页集。
关键词:
锚文本,
并行检索,
信息检索
CLC Number:
GAO Shan; HE Ting-ting; HU Wen-min. Parallel Retrieval Strategy Based on Anchor Text[J]. Computer Engineering, 2008, 34(19): 30-31,3.
高 珊;何婷婷;胡文敏. 一种基于锚文本的并行检索策略[J]. 计算机工程, 2008, 34(19): 30-31,3.