作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (04): 52-54. doi: 10.3969/j.issn.1000-3428.2012.04.017

• 软件技术与数据库 • 上一篇    下一篇

基于本体的Deep Web数据源发现方法

李道申,刘 勇   

  1. (河南科技大学电子信息工程学院,河南 洛阳 471003)
  • 收稿日期:2011-07-19 出版日期:2012-02-20 发布日期:2012-02-20
  • 作者简介:李道申(1986-),男,硕士研究生,主研方向:Web数据挖掘;刘 勇,教授
  • 基金资助:
    国家自然科学基金资助项目(70671035)

Deep Web Data Sources Discovery Method Based on Ontology

LI Dao-shen, LIU Yong   

  1. (College of Electronic Information Engineering, Henan University of Science and Technology, Luoyang 471003, China)
  • Received:2011-07-19 Online:2012-02-20 Published:2012-02-20

摘要: 提出一种基于本体的Deep Web数据源发现方法,采用网页分类、表单内容分类、表单结构分类方式,确定符合某领域的Deep Web查询接口。在网页分类和表单内容分类中引入本体的半自动构建和自动扩展模块,在表单结构分类中添加启发式规则。实验结果证 明,该方法能有效提高Deep Web数据源的查全率和查准率。

关键词: 深网, 本体, 数据源, 半自动构建, 分类模型

Abstract: This paper presents a Deep Web data sources discovery method based on ontology. It uses webpage classification, form structure classification and form content classification to find Deep Web querying interface in some fields. It proposes that semi-automatic construction and automatic extension of ontology are added to the webpage and form content classification, and heuristic rules are enriched in the form structure classification. Experimental results show that this method can improve the precision and recall of Deep Web database discovery effectively.

Key words: Deep Web, ontology, data sources, semi-automatic construction, classification model

中图分类号: