Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2008, Vol. 34 ›› Issue (19): 47-48,5. doi: 10.3969/j.issn.1000-3428.2008.19.017

• Software Technology and Database • Previous Articles     Next Articles

Design of Distributed Parallel Web Spider Based on ProActive

ZHANG Lin-cai1,2, LIANG Zheng-you1   

  1. (1. School of Computer and Electronic Information, Guangxi University, Nanning 530004; 2. School of Computer and Communication Engineering, Liaoning Shihua University, Fushun 113001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-10-05 Published:2008-10-05

基于ProActive的分布式并行Web Spider设计

张林才1,2,梁正友1

  

  1. (1. 广西大学计算机与电子信息学院,南宁 530004;2. 辽宁石油化工大学计算机与通信工程学院,抚顺 113001)

Abstract: It becomes more slowly to collect the data by single Web Spider, and higher cost for developing distributed Web Spider by MPI technology or Java technology. This paper designs and realizes a distributed parallel Web Spider with the Active Object, Network parallel computing technology and automatic deployment mechanism provided by ProActive middleware. The experimental results show that the data collection rate of the P-Spider is 2.2 times faster than the data collection rate of the multi-thread Web Spider.

Key words: Web Spider programme, ProActive middleware, parallel, distributed

摘要: 单机Web Spider的数据采集速度较慢,采用MPI技术或直接用Java开发分布式Web Spider代价较高。该文利用ProActive中间件提供的主动对象技术、网络并行计算技术、自动部署机制设计实现了P-Spider分布式并行Web Spider。实验结果表明,该P-Spider采集速率是单机多线程Web Spider的2.2倍。

关键词: Web Spider程序, ProActive中间件, 并行, 分布式

CLC Number: