作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (1): 280-282. doi: 10.3969/j.issn.1000-3428.2010.01.097

• 开发研究与设计技术 • 上一篇    下一篇

基于Linux的网络爬虫系统

王 锋,王 伟,张 璟,罗作民   

  1. (西安理工大学计算机科学与工程学院,西安 710048)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-01-05 发布日期:2010-01-05

Web Crawler System Based on Linux

WANG Feng, WANG Wei, ZHANG Jing, LUO Zuo-min   

  1. (College of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-01-05 Published:2010-01-05

摘要: 针对目前影响爬虫程序效率的诸多关键因素,在研究爬虫程序内部运行机理的基础上,进行架构优化,改进爬虫程序中的相关算法。在Linux网络环境下,通过对实现的爬虫程序运行进行检测,反馈出该解决方案和改进之处具有可行性,提高了页面抓取的效率和爬虫程序的整体性能。

关键词: 网络爬虫, URL调度, DNS解析, 哈希算法

Abstract: In view of current key aspects that affect the crawler system efficiency, through research of crawler system interior movement mechanism, this paper optimizes the overhead construction and improves its algorithm. In the Linux network environment, through movement examination of the crawler system, it may feed back several kinds of solutions and improvement place which are feasible, and it also enhances the efficiency and the crawler system overall performance.

Key words: Web crawler, URL dispatch, DNS resolution, Hash algorithm

中图分类号: