Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2008, Vol. 34 ›› Issue (7): 56-58.

• Software Technology and Database • Previous Articles     Next Articles

Deep Web Sources Focused Crawler

LIN Chao, ZHAO Peng-peng, CUI Zhi-ming   

  1. (Institute of Intelligent Information Processing and Application, Suzhou University, Suzhou 215006)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-04-05 Published:2008-04-05

Deep Web数据源聚焦爬虫

林 超,赵朋朋,崔志明   

  1. (苏州大学智能信息处理及应用研究所,苏州 215006)

Abstract: A lot of pages on Internet are generated dynamically by the back-end databases, which can not be reached by the traditional search engines called Deep Web. This paper proposes an algorithm of Deep Web sources focused crawling. When evaluating the importance of hyperlinks, it takes into consideration relevance among page, topic, and link-related information. Experiments indicate that this method is effective.

Key words: Deep Web sourtes, focused crawler, Bayes classifier

摘要: Internet上有大量页面是由后台数据库动态产生的,这部分页面不能通过传统的搜索引擎访问,被称为Deep Web。数据源发现是大规模Deep Web数据源集成的关键步骤。该文提出一种针对Deep Web数据源的聚焦爬行算法。在评价链接重要性时,综合考虑了页面与主题的相关性和链接相关信息。实验证明该方法是有效的。

关键词: Deep Web数据源, 聚焦爬虫, 贝叶斯分类器

CLC Number: