计算机工程 ›› 2010, Vol. 36 ›› Issue (8): 275-277.doi: 10.3969/j.issn.1000-3428.2010.08.096

• 开发研究与设计技术 • 上一篇    下一篇

面向垂直搜索引擎的Web站点划分方案

李学凯,许 笑,孙春奇,张伟哲,李 斌   

  1. (哈尔滨工业大学计算机学院,哈尔滨 150001)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-04-20 发布日期:2010-04-20

Web Site Partition Scheme for Vertical Search Engine

LI Xue-kai, XU Xiao, SUN Chun-qi, ZHANG Wei-zhe, LI Bin   

  1. (College of Computer, Harbin Institute of Technology, Harbin 150001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-04-20 Published:2010-04-20

摘要: 分析传统搜索引擎分配任务的方式及存在的问题,根据垂直搜索引擎的特点,提出一种比传统方法粒度更细的任务分配方式——网站划分。该分配方式将较大规模的网站切分为若干较小规模的子集,并将子集交给若干爬虫节点并行抓取,以加快爬虫系统的整体获取速率,作为对传统方法的有效优化。将网站划分算法应用于样本数据集,验证其有效性。

关键词: 垂直搜索引擎, 任务分配, 网站划分, 爬虫

Abstract: In allusion to the problem of traditional search engines’ task allocating methods, a new fine-grained method called Web site partition is presented, which is as an effective optimization of the traditional method adopted by vertical search engines. This method divides large-scale Web sites into a number of smaller subsets, so that several crawlers can parallel crawl each subset in order to accelerate the overall downloading progress. The proposed algorithm is proved to be effective against the sample data sets.

Key words: vertical search engine, task allocation, Web site partition, crawler

中图分类号: