摘要: 分析搜索引擎Google的PageRank算法,给出其存在的3个问题及针对这3个问题提出的改进。结合锚文本相似度提出一种改进的PageRank算法,利用Nutch对传统PageRank算法和改进后的PageRank算法进行实验分析与比较。实验结果表明,改进的PageRank算法提高了搜索结果的查准率,有利于减少主题漂移现象。
关键词:
PageRank算法,
锚文本,
相似度,
主题漂移
Abstract: This paper analyzes PageRank algorithm, which is the key technology of search engine Google. Three issues and the existing improvements are pointed out. An improved PageRank algorithm combined with anchor texts similarity is proposed, and traditional PageRank algorithm and the improved algorithm are compared by Nutch. Experimental results show that the improved PageRank algorithm improves the precision of the search results, which help to reduce topic-drift phenomenon.
Key words:
PageRank algorithm,
anchor texts,
similarity,
topic-drift
中图分类号:
王钟斐, 王彪. 基于锚文本相似度的PageRank改进算法[J]. 计算机工程, 2010, 36(24): 258-260.
WANG Zhong-Fei, WANG Biao. Improved PageRank Algorithm Based on Anchor Texts Similarity[J]. Computer Engineering, 2010, 36(24): 258-260.