[1] REINSEL D,GANTZ J,RYDNING J.Data age 2025:the evolution of data to life-critical don't focus on big data;focus on the data that's big[EB/OL].[2018-11-08].https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/workforce/Seagate-WP-DataAge2025-March-2017.pdf. [2] MITCHELL R.Web scraping with python[M].Sebastopol,USA:O'Reilly Media,Inc.,2016. [3] SHKAPENYUK V,SUEL T.Design and implementation of a high-performance distributed Web crawler[C]//Proceedings of the 18th International Conference on Data Engineering.Washington D.C.,USA:IEEE Press,2002:357. [4] SHI Yuliang,ZHANG Ti.Design and implementation of a scalable distributed Web crawler based on Hadoop[C]//Proceedings of the 2nd International Conference on Big Data Analysis.Washington D.C.,USA:IEEE Press,2017:537-541. [5] LIN K,CHUNG S,LIN C.A fast and distributed algorithm for mining frequent patterns in congested networks[J].Computing,2016,98(3):235-256. [6] JAGANATHAN P,KARTHIKEYAN T.Highly efficient architecture for scalable focused crawling using incremental parallel Web crawler[J].Journal of Computer Science,2015,11(1):120-126. [7] 董禹龙,杨连贺,马欣.主动获取式的分布式网络爬虫集群方法研究[J].计算机科学,2018,45(S1):428-432. [8] 李婷.分布式爬虫任务调度与AJAX页面抓取研究[D].成都:电子科技大学,2015. [9] KARGER D,RUHL M.Simple efficient load balancing algorithms for peer-to-peer systems[C]//Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures.New York,USA:ACM Press,2004:27-30. [10] 王霓虹,张露露.分布式爬虫任务调度策略的优化[J].黑龙江大学自然科学学报,2016,33(5):671-675,701. [11] NASRI M,SHARIFI M.Load balancing using consistent hashing:a real challenge for large scale distributed Web crawlers[C]//Proceedings of 2009 International Conference on Advanced Information Networking and Applications Workshops.New York,USA:ACM Press,2009:715-720. [12] YING Zheyu,ZHANG Fengli,FAN Qingyu.Consistent hashing algorithm based on slice in improving scrapy-redis distributed crawler efficiency[C]//Proceedings of 2018 IEEE International Conference on Computer and Communication Engineering Technology.Washington D.C.,USA:IEEE Press,2018:334-340. [13] MAKRIS A,TSERPES K,ANAGNOSTOPOULOS D,et al.Load balancing for minimizing the average response time of get operations in distributed key-value stores[C]//Proceedings of the 14th International Conference on Networking,Sensing and Control.Washington D.C.,USA:IEEE Press,2017:263-269. [14] HU Licong,XU Yajing,XU Huimin.A dynamic load balancing algorithm based on consistent hash[C]//Proceedings of the 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference.Washington D.C.,USA:IEEE Press,2018:2387-2391. [15] GE Dajie,DING Zhijun.A task scheduling strategy based on weighted round robin for distributed crawler[C]//Proceedings of Concurrency and Computation-Practice and Experience.Washington D.C.,USA:IEEE Press,2015:848-852. [16] 付志辉.分布式爬虫的动态负载均衡方法研究[D].哈尔滨:哈尔滨工业大学,2014. [17] 孙守兴.基于可扩展哈希算法的并行爬虫动态负载均衡实现[D].哈尔滨:哈尔滨工业大学,2010. [18] SU Linping,WANG Fengxiao.Web crawler model of fetching data speedily based on Hadoop distributed system[C]//Proceedings of the 7th IEEE International Conference on Software Engineering and Service Science.Washington D.C.,USA:IEEE Press,2016:927-931. [19] LIU Lina,LIU Xuemin,ZHANG Shibo,et al.High efficient distributed cache system based on Redis cluster[J].Computer Systems and Applications,2018,27(10):91-98. [20] 闫明.高可用可扩展集群化Redis设计与实现[D].西安:西安电子科技大学,2014. [21] ZHUO Guirong,WANG Bingxue.Structure designing of BP neural network in the application of reference velocity estimation[C]//Proceedings of 2014 IEEE International Conference on Mechatronics and Automation.Washington D.C.,USA:IEEE Press,2014:1481-1485. [22] NICLSCN R.Kolmogorov's mapping neutral network existence theorem[C]//Proceedings of the 1st International Conference on Neural Networks.Washington D.C.,USA:IEEE Press,1987:11-13. |