Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2006, Vol. 32 ›› Issue (12): 208-209,228.

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Preliminary Design of A Context-Graph-based Focused Crawler

LI Daosheng, ZHAO Qiang   

  1. Institute of Computer Applications, China Academy of Engineering Physics, Mianyang 621900
  • Online:2006-06-20 Published:2006-06-20

基于语景图的主题爬取器的初步设计

李道生,赵 强   

  1. 中国工程物理研究院计算机应用研究所,绵阳621900

Abstract: This paper designes a focused crawler using context graph. The crawler is based on a set of Naive Bayes classifiers, which adopt both VSM and probability model for design comparison purpose. The frontier priority queue within a layer of the context graph is sorted by the cosine similarity between a downloaded normalized document vector and the query vector. An approach to classifying search results into a pre-defined category is presented.

Key words: Focused crawling; Machine learning; Context graph

摘要: 介绍了一个基于语景图的Web 主题爬取器的初步设计。描述了NB 分类器的文本学习的向量空间模型——Bernoulli 模型及NaiveBayes 分类器设计提出了简化的前端队列优先排序的设计方案,即下载文档的归一化文档向量与查询向量的余弦相似度,作为层内下载文档的排序准则,以便与各层队列中文档的类似然率得分排序进行对比。介绍了自动实现爬取结果与主题分类目录的集成设想。

关键词: 主题爬取;机器学习;语景图