Abstract:
Nutch is an open source search engine based on Java. In view of Nutch disappointment which uses Chinese individual character segmentation and can not realize the PageRank computation, the MapReduce-based PageRank computation is designed and implemented in the foundation of improving the PageRank algorithm. JE Chinese word segmentation is added to improve Chinese segmentation on Nutch. Experimental results show that the improved Nutch raises the inquiry rate of accuracy and the ranking quality of Chinese Web.
Key words:
Nutch search engine,
MapReduce model,
PageRank algorithm,
JE Chinese word segmentation
摘要: Nutch是一个Java实现的开源搜索引擎。针对目前Nutch对中文进行单字切分且没有实现PageRank计算的缺点,改进PageRank算法,设计并实现基于MapReduce的PageRank计算方法,对Nutch中文分词进行改进,加入JE中文分词器。实验结果表明,改进后的Nutch具有更高的查询结果准确率和中文网页排序效果。
关键词:
Nutch搜索引擎,
MapReduce模型,
PageRank算法,
JE中文分词器
CLC Number:
BO Chao, LIANG Zheng-You. Improvement Method of Web Page Ranking Quality on Nutch[J]. Computer Engineering, 2010, 36(13): 42-44.
潘涛, 梁正友. Nutch中网页排序效果的改进方法[J]. 计算机工程, 2010, 36(13): 42-44.