作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (13): 42-44. doi: 10.3969/j.issn.1000-3428.2010.13.015

• 软件技术与数据库 • 上一篇    下一篇

Nutch中网页排序效果的改进方法

潘 涛,梁正友   

  1. (广西大学计算机与电子信息学院,南宁 530004)
  • 出版日期:2010-07-05 发布日期:2010-07-05
  • 作者简介:潘 涛(1985-),男,硕士研究生,主研方向:高性能计算与网络系统;梁正友,教授、博士
  • 基金资助:
    广西自然科学基金资助项目(桂科自0832059)

Improvement Method of Web Page Ranking Quality on Nutch

PAN Tao, LIANG Zheng-you   

  1. (School of Computer, Electronics and Information, Guangxi University, Nanning 530004)
  • Online:2010-07-05 Published:2010-07-05

摘要: Nutch是一个Java实现的开源搜索引擎。针对目前Nutch对中文进行单字切分且没有实现PageRank计算的缺点,改进PageRank算法,设计并实现基于MapReduce的PageRank计算方法,对Nutch中文分词进行改进,加入JE中文分词器。实验结果表明,改进后的Nutch具有更高的查询结果准确率和中文网页排序效果。

关键词: Nutch搜索引擎, MapReduce模型, PageRank算法, JE中文分词器

Abstract: Nutch is an open source search engine based on Java. In view of Nutch disappointment which uses Chinese individual character segmentation and can not realize the PageRank computation, the MapReduce-based PageRank computation is designed and implemented in the foundation of improving the PageRank algorithm. JE Chinese word segmentation is added to improve Chinese segmentation on Nutch. Experimental results show that the improved Nutch raises the inquiry rate of accuracy and the ranking quality of Chinese Web.

Key words: Nutch search engine, MapReduce model, PageRank algorithm, JE Chinese word segmentation

中图分类号: