A Topic-specific Intelligent Web Crawler System

doi:10.3969/j.issn.1000-3428.2006.03.021

Computer Engineering ›› 2006, Vol. 32 ›› Issue (3): 57-59.

• Software Technology and Database • Previous Articles Next Articles

A Topic-specific Intelligent Web Crawler System

QIAN Rong1, XU Xinhua2, ZHENG Ying3, YANG Bingru1

1.School of Information Engineering, Beijing University of Science and Technology, Beijing 100083; 2.Guanzhuang Campus, Beijing University of Science and Technology, Beijing 100083; 3. Personnel Department, Jinan University, Jinan 250022

Online:2006-02-05 Published:2006-02-05

智能专题化信息搜集 Crawler

钱榕 1，徐新华2，郑莹 3，杨炳儒1

1.北京科技大学信息工程学院，北京 100083；2. 北京科技大学管庄校区信息工程系，北京100083；3. 济南大学人事处，济南 250022

Abstract

Abstract: This paper introduces the topic-specific intelligent Web Crawler system and its crawling algorithm based on Web content and structure mining. The algorithm takes full advantage of the characteristics of the neural network and can simulate the network topology conveniently and parallel calculation. The paper introduces the reinforcement learning to judge the relativity between the crawled page and the topic. When calculating the correlation, without regarding to the whole content of the Web page, but to abstract the important tags of HTML makeup of the Web page, to analyze the content and structure of the page, thereby judge the relativity between the crawled page and the topic, improve the efficiency and accuracy of collected information enormously.

Key words: Topic-specific crawler; Web mining; Neural network; Reinforcement learning

摘要： 介绍了基于Web 内容和结构挖掘的专题化智能Web 爬行Crawler 系统，并重点介绍其中CA(C&S)算法，该算法充分利用神经网络可以方便地模拟网络的拓扑结构和并行计算的特点，采用加强学习判断网页与主题的相关度，在进行相关度计算时，不考虑网页的全部内容，而通过提取网页的HTML 描述中的重要标记，对Web 网页进行内容和结构分析，从而判断爬行到的网页与主题的相关性，以提高信息搜集的效率和精确性

关键词: 专题化爬行；Web 挖掘；神经网络；加强学习

QIAN Rong, XU Xinhua, ZHENG Ying, YANG Bingru. A Topic-specific Intelligent Web Crawler System[J]. Computer Engineering, 2006, 32(3): 57-59.

钱榕，徐新华，郑莹，杨炳儒. 智能专题化信息搜集 Crawler[J]. 计算机工程, 2006, 32(3): 57-59.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2006/V32/I3/57

Please choose a citation manager

Content to export

A Topic-specific Intelligent Web Crawler System

智能专题化信息搜集 Crawler

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

A Topic-specific Intelligent Web Crawler System

智能专题化信息搜集 Crawler

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments