Abstract:
Aiming at the problem that using the script of Web page widely, the traditional search engine is difficult to extract the information, this paper uses HtmlUnit to interpret JavaScript dynamic Web page, and uses Selenium IDE to extract XPath of dynamic element, the seeking-job search engine extracts successfully the information of Web page produced dynamically. Experimental results show that this technology is useful.
Key words:
dynamic Web page,
information extraction,
seeking-job,
search
摘要: 针对传统搜索引擎难以提取客户端脚本生成信息的问题,结合求职搜索引擎的研发,运用HtmlUnit解析JavaScript动态网页,使用Selenium IDE提取动态元素的XPath,解决传统搜索引擎难以提取客户端动态生成信息的问题。实验结果证明,该技术是行之有效的。
关键词:
动态网页,
信息提取,
求职,
搜索
CLC Number:
FANG Hong; LV Tai-zhi. Application of Dynamic Web Page Information Extraction Technology in Seeking-job Search[J]. Computer Engineering, 2009, 35(24): 265-267.
方 宏;吕太之. 动态网页信息提取技术在求职搜索中的应用[J]. 计算机工程, 2009, 35(24): 265-267.