摘要: 针对目前影响爬虫程序效率的诸多关键因素,在研究爬虫程序内部运行机理的基础上,进行架构优化,改进爬虫程序中的相关算法。在Linux网络环境下,通过对实现的爬虫程序运行进行检测,反馈出该解决方案和改进之处具有可行性,提高了页面抓取的效率和爬虫程序的整体性能。
关键词:
网络爬虫,
URL调度,
DNS解析,
哈希算法
Abstract: In view of current key aspects that affect the crawler system efficiency, through research of crawler system interior movement mechanism, this paper optimizes the overhead construction and improves its algorithm. In the Linux network environment, through movement examination of the crawler system, it may feed back several kinds of solutions and improvement place which are feasible, and it also enhances the efficiency and the crawler system overall performance.
Key words:
Web crawler,
URL dispatch,
DNS resolution,
Hash algorithm
中图分类号:
王 锋;王 伟;张 璟;罗作民. 基于Linux的网络爬虫系统[J]. 计算机工程, 2010, 36(1): 280-282.
WANG Feng; WANG Wei; ZHANG Jing; LUO Zuo-min. Web Crawler System Based on Linux[J]. Computer Engineering, 2010, 36(1): 280-282.