摘要: 单机Web Spider的数据采集速度较慢,采用MPI技术或直接用Java开发分布式Web Spider代价较高。该文利用ProActive中间件提供的主动对象技术、网络并行计算技术、自动部署机制设计实现了P-Spider分布式并行Web Spider。实验结果表明,该P-Spider采集速率是单机多线程Web Spider的2.2倍。
关键词:
Web Spider程序,
ProActive中间件,
并行,
分布式
Abstract: It becomes more slowly to collect the data by single Web Spider, and higher cost for developing distributed Web Spider by MPI technology or Java technology. This paper designs and realizes a distributed parallel Web Spider with the Active Object, Network parallel computing technology and automatic deployment mechanism provided by ProActive middleware. The experimental results show that the data collection rate of the P-Spider is 2.2 times faster than the data collection rate of the multi-thread Web Spider.
Key words:
Web Spider programme,
ProActive middleware,
parallel,
distributed
中图分类号:
张林才;梁正友. 基于ProActive的分布式并行Web Spider设计[J]. 计算机工程, 2008, 34(19): 47-48,5.
ZHANG Lin-cai; LIANG Zheng-you. Design of Distributed Parallel Web Spider Based on ProActive[J]. Computer Engineering, 2008, 34(19): 47-48,5.