Research on WatiJ-based Spider for Deep Web

doi:10.3969/j.issn.1000-3428.2011.04.095

Computer Engineering ›› 2011, Vol. 37 ›› Issue (4): 264-266. doi: 10.3969/j.issn.1000-3428.2011.04.095

• Networks and Communications • Previous Articles Next Articles

Research on WatiJ-based Spider for Deep Web

LIU Shao-bin, ZHANG Zu-ping, LONG Jun

(School of Information Science and Engineering, Central South University, Changsha 410083, China)

Online:2011-02-20 Published:2011-02-17

一种基于WatiJ的Deep Web蜘蛛研究

刘邵斌，张祖平，龙军

(中南大学信息科学与工程学院，长沙 410083)

作者简介:刘邵斌(1983－)，男，硕士研究生，主研方向：垂直搜索；张祖平，教授、博士、博士生导师；龙军，副教授、博士
基金资助:
国家自然科学基金资助项目(60873081, 60970095, M0921 005)；湖南省自然科学基金资助项目(07JJ6122)

Abstract

Abstract: As to the problems that a significant part of information can not be crawled effectively because of the dynamic Webs, a Web spider for the deep Web based on automated test tools called WatiJ is designed. The principle of using WatiJ to imitate users to submit query forms, continued next page is described, key steps of the crawling for the dynamic Webs are introduced by examples. Proved by the experiments, this spider is an effective one for crawling dynamic Webs in authorized data source.

Key words: dynamic webpage, automated test, Web spider

摘要： Deep Web中相当一部分内容因为动态网页存在而不能进行有效抓取。为此，设计并实现一种基于Web自动化测试工具——WatiJ的Deep Web网络蜘蛛。阐述利用WatiJ实现用户提交查询表单、循环点击翻页按钮等拟人交互方式的原理，通过实例给出动态网页抓取的关键步骤。实验结果表明，该蜘蛛是针对授权数据源进行动态网页抓取的一种有效解决方案。

关键词: 动态网页, 自动化测试, 网络蜘蛛

CLC Number:

N945

LIU Shao-Bin, ZHANG Jie-Beng, LONG Jun. Research on WatiJ-based Spider for Deep Web[J]. Computer Engineering, 2011, 37(4): 264-266.

刘邵斌, 张祖平, 龙军. 一种基于WatiJ的Deep Web蜘蛛研究[J]. 计算机工程, 2011, 37(4): 264-266.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2011.04.095

http://www.ecice06.com/EN/Y2011/V37/I4/264

[1]	WANG Jin, ZUO Chun, ZHANG Zheng. Automated Testing Tool Based on Sample Program [J]. Computer Engineering, 2020, 46(3): 198-205,213.
[2]	DING Shiju,GU Naijie,HUANG Zhangjin,HOU Jin. APP control recognition algorithm based on text recognition and page layout [J]. Computer Engineering, 2019, 45(6): 89-95.
[3]	WANG Qi,SUN Wenhui. Software Fault Localization Based on Program Mutation Analysis [J]. Computer Engineering, 2017, 43(12): 55-59.
[4]	YANG An-Liang, GONG Xiao-Dui, TAO Gang, HAN Xin-Hui. A Privacy Leakage Detection System for Android [J]. Computer Engineering, 2012, 38(23): 1-6.
[5]	ZHANG Yong-Jiang, WANG Jiang. Tcl-based Smart Card Software Test Method [J]. Computer Engineering, 2011, 37(8): 50-51.
[6]	JI Ye, CHEN Yan, YANG Jian, MU Rong. Design and Implementation of Vertical Search Engine in Life Services Domain [J]. Computer Engineering, 2010, 36(24): 24-26.
[7]	YU Chao, MOU Guo-Qiang. State Generating Algorithm in Automated Test [J]. Computer Engineering, 2010, 36(19): 65-66.
[8]	ZHANG Lin-Cai, LIANG Zheng-You, WANG Gong-Xia. Improvement of ProActive-based P-Spider1.0 [J]. Computer Engineering, 2010, 36(17): 288-290.
[9]	DAI Ming-xing; DU Yan-hui. Design for Content Search Engine Based on WebLech [J]. Computer Engineering, 2008, 34(9): 278-280.
[10]	ZHANG Lin-cai; LIANG Zheng-you. Design of Distributed Parallel Web Spider Based on ProActive [J]. Computer Engineering, 2008, 34(19): 47-48,5.
[11]	ZHANG Li; YUAN Hai-wen; WANG Qiu-sheng. Application of ATML in Integrated Vehicle Health Management System [J]. Computer Engineering, 2008, 34(12): 215-217.
[12]	WANG Xiao-bin; LIAO Jian-xin; WANG Chun; LIU Zheng; ZHU Xiao-min;. Research and Implementation of Automated Test System for Mobile Intelligent Network Service [J]. Computer Engineering, 2008, 34(1): 244-246.
[13]	GUO Qiong; LI Xiubin. Automated Test Method of Finance System Based on WinRunner [J]. Computer Engineering, 2007, 33(14): 275-276,.

Please choose a citation manager

Content to export

Research on WatiJ-based Spider for Deep Web

一种基于WatiJ的Deep Web蜘蛛研究

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 13

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Research on WatiJ-based Spider for Deep Web

一种基于WatiJ的Deep Web蜘蛛研究

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 13

Recommended Articles

Metrics

Comments