Abstract:
The distributed parallel Web Spider with center node is inadequate in expandability, and there is excessive burden on center node. In the same way, the communication load is not balanced. In order to overcome these problems, this paper presents an improved URL removing algorithm based on Rabin fingerprint algorithm. The improved scheme of Peer-to-Peer structure is proposed. The improved distributed parallel Web Spider is developed with ProActive middleware. Contrast experiments show that the improved Web Spider has higher collection efficiency, balanced communication load, without node bottleneck, and better expandability.
Key words:
Web Spider,
ProActive middleware,
Peer-to-Peer(P2P),
distributed,
center node
摘要: 针对带中心节点结构的分布式并行Web Spider的中心节点负担过重、通信负载不均衡、可扩展性差的问题,提出基于Rabin指纹算法的URL去重改进算法和节点对等结构的改进方案,利用ProActive中间件设计开发改进的分布式并行Web Spider。对比实验表明,改进后的Web Spider采集效率更高,通信负载均衡,无节点瓶颈问题,具有良好的可扩展性。
关键词:
网络蜘蛛,
ProActive中间件,
节点对等,
分布式,
中心节点
CLC Number:
ZHANG Lin-Cai, LIANG Zheng-You, WANG Gong-Xia. Improvement of ProActive-based P-Spider1.0[J]. Computer Engineering, 2010, 36(17): 288-290.
张林才, 梁正友, 王红霞. 基于ProActive的P-Spider1.0改进[J]. 计算机工程, 2010, 36(17): 288-290.