基于多Agent系统的定题爬虫算法

doi:10.3969/j.issn.1000-3428.2008.16.070

计算机工程 ›› 2008, Vol. 34 ›› Issue (16): 204-206. doi: 10.3969/j.issn.1000-3428.2008.16.070

基于多Agent系统的定题爬虫算法

徐照财，程显毅

(江苏大学计算机科学与通信工程学院，镇江 212013)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-08-20 发布日期:2008-08-20

Focused Crawling Algorithm Based on Multi-agent System

XU Zhao-cai, CHENG Xian-yi

(Computer Science & Communication Engineering Institute, Jiangsu University. Zhenjiang 212013)

Received:1900-01-01 Revised:1900-01-01 Online:2008-08-20 Published:2008-08-20

摘要/Abstract

摘要： 定题爬虫的研究是定题搜索引擎的关键技术。该文提出一种基于多Agent系统的爬虫算法，采用本题语义主题关键词过滤的方法来抓取与主题相关的网页，利用本体库语义网络实现本体领域中同近义词的过滤。凭借HTML网页标记对关键字识别的不同权重和超链接锚文本对主题相关网页进行预测，通过黑板的通信机制实现多Agent交互。实验结果表明算法在抓取网页的查准率、查全率方面有一定的改善。

关键词: 定题爬虫, 主题关键字过滤, 语义

Abstract: Focused crawling research is key to search engine technology. In this paper, a focused crawling algorithm based on multi-Agent system is presented, which presents a core issue of a theme key words filtering method based on ontology to collect the URL related to the themes. Semantic network based on the ontology is to achieve filtering of similar meaning. It also introduces keyword identification of different weights by HTML page tags and anchor text, which are important for the website forecast use. And system model based on the blackboard communication mechanism is explained. The experimental results show that the system has an increasing promotion in both precision and extension for crawling website.

Key words: focused crawling, theme key words filtering, semantics

中图分类号:

TP301.6

徐照财;程显毅. 基于多Agent系统的定题爬虫算法[J]. 计算机工程, 2008, 34(16): 204-206.

XU Zhao-cai; CHENG Xian-yi. Focused Crawling Algorithm Based on Multi-agent System[J]. Computer Engineering, 2008, 34(16): 204-206.

http://www.ecice06.com/CN/Y2008/V34/I16/204

[1]	徐春波, 闫娟, 杨慧斌, 王博, 吴晗. 基于目标检测和语义分割的视觉SLAM算法[J]. 计算机工程, 2023, 49(8): 199-206, 214.
[2]	崔晓丹, 刘达维, 刘逸凡, 赵志滨, 任酉贵, 闫永明. 新闻类短视频关键帧摘要模型的研究与实现[J]. 计算机工程, 2023, 49(8): 182-189.
[3]	王款, 宣士斌, 何雪东, 李紫薇, 李嘉祥. 基于交叉注意力Transformer的人体姿态估计方法[J]. 计算机工程, 2023, 49(7): 223-231.
[4]	陈明, 刘蓉, 张晔. 基于多重注意力机制的中文医疗实体识别[J]. 计算机工程, 2023, 49(6): 314-320.
[5]	付嘉豪, 杨嘉怡, 李爱国. 面向安防系统的高效用语义轨迹模式挖掘[J]. 计算机工程, 2023, 49(6): 62-70.
[6]	赵宏, 陈志文, 郭岚, 安冬. 基于ViT与语义引导的视频内容描述生成[J]. 计算机工程, 2023, 49(5): 247-254.
[7]	叶琪, 张一乾, 阮彤, 杜渂. 基于语义和结构置信度的知识图谱质量校验方法[J]. 计算机工程, 2023, 49(5): 48-55.
[8]	陈文轩, 曾碧, 郭植星. 融合多特征与语义图卷积网络的摔倒检测方法[J]. 计算机工程, 2023, 49(5): 277-285,294.
[9]	衡红军, 苗菁. 语义与句法信息加强的二元标记实体关系联合抽取[J]. 计算机工程, 2023, 49(4): 77-84.
[10]	逄涛, 张学敏, 姚亚洲, 高明柯. 基于特征增强的光学遥感图像建筑物变化检测[J]. 计算机工程, 2023, 49(4): 182-187.
[11]	白俊卿, 韩柏迅, 张丰侠. 基于深度学习的无人机图像语义分割算法研究[J]. 计算机工程, 2023, 49(4): 233-239.
[12]	马素刚, 陈期梅, 侯志强, 杨小宝, 张子贤. 基于密集连接与特征增强的语义分割算法[J]. 计算机工程, 2023, 49(3): 263-270.
[13]	苏鸣方, 胡立坤, 黄润辉. 基于上下文注意力的室外点云语义分割方法[J]. 计算机工程, 2023, 49(3): 248-256.
[14]	范润泽, 刘宇红, 张荣芬, 李景玉. 基于多尺度注意力机制的道路场景语义分割模型[J]. 计算机工程, 2023, 49(2): 288-295.
[15]	杨振宇, 王磊, 马博, 杨雅婷, 董瑞, 艾孜麦提·艾瓦尼尔, 王震. 一种针对维汉的跨语言远程监督方法[J]. 计算机工程, 2023, 49(2): 271-278.

选择文件类型/文献管理软件名称

选择包含的内容

基于多Agent系统的定题爬虫算法

Focused Crawling Algorithm Based on Multi-agent System

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于多Agent系统的定题爬虫算法

Focused Crawling Algorithm Based on Multi-agent System

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价