Preliminary Design of A Context-Graph-based Focused Crawler

doi:10.3969/j.issn.1000-3428.2006.12.079

Computer Engineering ›› 2006, Vol. 32 ›› Issue (12): 208-209，228.

• Artificial Intelligence and Recognition Technology • Previous Articles Next Articles

Preliminary Design of A Context-Graph-based Focused Crawler

LI Daosheng, ZHAO Qiang

Institute of Computer Applications, China Academy of Engineering Physics, Mianyang 621900

Online:2006-06-20 Published:2006-06-20

基于语景图的主题爬取器的初步设计

李道生，赵强

中国工程物理研究院计算机应用研究所，绵阳621900

Abstract

Abstract: This paper designes a focused crawler using context graph. The crawler is based on a set of Naive Bayes classifiers, which adopt both VSM and probability model for design comparison purpose. The frontier priority queue within a layer of the context graph is sorted by the cosine similarity between a downloaded normalized document vector and the query vector. An approach to classifying search results into a pre-defined category is presented.

Key words: Focused crawling; Machine learning; Context graph

摘要： 介绍了一个基于语景图的Web 主题爬取器的初步设计。描述了NB 分类器的文本学习的向量空间模型——Bernoulli 模型及NaiveBayes 分类器设计提出了简化的前端队列优先排序的设计方案，即下载文档的归一化文档向量与查询向量的余弦相似度，作为层内下载文档的排序准则，以便与各层队列中文档的类似然率得分排序进行对比。介绍了自动实现爬取结果与主题分类目录的集成设想。

关键词: 主题爬取；机器学习；语景图

LI Daosheng, ZHAO Qiang. Preliminary Design of A Context-Graph-based Focused Crawler[J]. Computer Engineering, 2006, 32(12): 208-209，228.

李道生，赵强. 基于语景图的主题爬取器的初步设计[J]. 计算机工程, 2006, 32(12): 208-209，228.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2006/V32/I12/208

Please choose a citation manager

Content to export

Preliminary Design of A Context-Graph-based Focused Crawler

基于语景图的主题爬取器的初步设计

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Preliminary Design of A Context-Graph-based Focused Crawler

基于语景图的主题爬取器的初步设计

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments