摘要: 提出一种基于本体的Deep Web数据源发现方法,采用网页分类、表单内容分类、表单结构分类方式,确定符合某领域的Deep Web查询接口。在网页分类和表单内容分类中引入本体的半自动构建和自动扩展模块,在表单结构分类中添加启发式规则。实验结果证 明,该方法能有效提高Deep Web数据源的查全率和查准率。
关键词:
深网,
本体,
数据源,
半自动构建,
分类模型
Abstract: This paper presents a Deep Web data sources discovery method based on ontology. It uses webpage classification, form structure classification and form content classification to find Deep Web querying interface in some fields. It proposes that semi-automatic construction and automatic extension of ontology are added to the webpage and form content classification, and heuristic rules are enriched in the form structure classification. Experimental results show that this method can improve the precision and recall of Deep Web database discovery effectively.
Key words:
Deep Web,
ontology,
data sources,
semi-automatic construction,
classification model
中图分类号:
李道申, 刘勇. 基于本体的Deep Web数据源发现方法[J]. 计算机工程, 2012, 38(04): 52-54.
LI Dao-Shen, LIU Yong. Deep Web Data Sources Discovery Method Based on Ontology[J]. Computer Engineering, 2012, 38(04): 52-54.