基于目录树的网络科技资源采集算法

doi:10.3969/j.issn.1000-3428.2009.01.097

计算机工程 ›› 2009, Vol. 35 ›› Issue (1): 277-279,. doi: 10.3969/j.issn.1000-3428.2009.01.097

基于目录树的网络科技资源采集算法

李国栋，刘忠强，柳长安

(华北电力大学计算机科学与技术学院，北京 102206)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-01-05 发布日期:2009-01-05

Crawler Algorithm Based on Directory Tree in Network Science and Technology Resource

LI Guo-dong, LIU Zhong-qiang, LIU Chang-an

(School of Computer Science and Technology, North China Electric Power University, Beijing 102206)

Received:1900-01-01 Revised:1900-01-01 Online:2009-01-05 Published:2009-01-05

摘要/Abstract

摘要： 针对网络科技领域资源分类方式多样化、数据量大等特点，提出一种基于目录树的采集算法，以领域本体知识库提供的本体知识作为评价依据进行有效目录链接的提取和识别，通过一种改进的链接分析策略获取有效的节点链接并进行采集操作。该算法研究采集体系结构，注重对最新资源获取速度的优化。实验结果证明，该算法可有效提高资源采集速率。

关键词: 科技资源, 信息采集, 目录树, 本体

Abstract: Aimming at full consideration of the characteristics of the network technology in a various methods of classification of resources and a large quantity, this paper proposes a kind of crawler algorithm based on directory tree. The algorithm extracts and recognizes the directory links based on domain ontology knowledge as effective evaluation, and links the nodes effectively through a modified strategy of link analysis, eventually carry through collecting operation. The algorithm not only studies in-depth on the crawler architecture, but also pays attention to the speed of access to the latest resources optimization. Experimental results show that the algorithm can effectively achieve the established objectives both in speed and efficiency.

Key words: science and technology resource, information crawling, directory tree, ontology

中图分类号:

TP301

李国栋;刘忠强;柳长安. 基于目录树的网络科技资源采集算法[J]. 计算机工程, 2009, 35(1): 277-279,.

LI Guo-dong; LIU Zhong-qiang; LIU Chang-an. Crawler Algorithm Based on Directory Tree in Network Science and Technology Resource[J]. Computer Engineering, 2009, 35(1): 277-279,.

http://www.ecice06.com/CN/Y2009/V35/I1/277

[1]	杨艳丽, 宋礼鹏. 融合社交网络威胁的攻击图生成方法[J]. 计算机工程, 2021, 47(5): 104-116.
[2]	东熠, 刘景发, 刘文杰. 基于多目标蚁群算法的主题爬虫策略[J]. 计算机工程, 2020, 46(9): 274-282.
[3]	安敬民, 李冠宇. 基于图熵极值理论的领域概念聚类方法[J]. 计算机工程, 2020, 46(6): 88-93.
[4]	王鑫, 傅强, 王林, 徐大为, 王昊奋. 知识图谱可视化查询技术综述[J]. 计算机工程, 2020, 46(6): 1-11.
[5]	朱文跃,刘炜,刘宗田. 基于事件本体的新闻个性化推荐[J]. 计算机工程, 2019, 45(6): 267-272,279.
[6]	蒋猛,禹明刚,王智学. 多策略自适应大规模本体映射算法[J]. 计算机工程, 2019, 45(3): 14-19.
[7]	王光, 姜丽, 董帅含, 李丰. 融合本体语义与用户属性的协同过滤算法[J]. 计算机工程, 2019, 45(10): 215-220.
[8]	杜胜浩,钱晓捷. 基于刻面与本体标识的语义Web服务发现方法[J]. 计算机工程, 2018, 44(8): 224-229,236.
[9]	李新福,徐筱,田学东. 基于Ontology扩展查询的数学表达式检索模型[J]. 计算机工程, 2018, 44(5): 155-161.
[10]	宫云宝,甘亮,黄九鸣. 基于概率软逻辑模型的实体解析[J]. 计算机工程, 2017, 43(8): 188-192,199.
[11]	郭竹为,刘胜全,刘艳,赵美玲,符贤哲. 基于最大公共子图的本体映射方法研究[J]. 计算机工程, 2017, 43(5): 197-203,209.
[12]	王汉博,孙启霖. 基于路径特征的复杂本体匹配[J]. 计算机工程, 2017, 43(2): 227-233,240.
[13]	韩道军,甘甜,叶曼曼,沈夏炯. 基于形式概念分析的本体构建方法研究[J]. 计算机工程, 2016, 42(2): 300-306.
[14]	叶施仁,孙宁. 基于概念聚类的领域本体图中文文本分类[J]. 计算机工程, 2016, 42(12): 181-187.
[15]	马雷雷,李宏伟,连世伟,梁汝鹏,陈虎. 一种基于本体语义的灾害主题爬虫策略[J]. 计算机工程, 2016, 42(11): 50-56.

选择文件类型/文献管理软件名称

选择包含的内容

基于目录树的网络科技资源采集算法

Crawler Algorithm Based on Directory Tree in Network Science and Technology Resource

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于目录树的网络科技资源采集算法

Crawler Algorithm Based on Directory Tree in Network Science and Technology Resource

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价