摘要: 提出一种利用搜索引擎发现数据源的方法。为向搜索引擎提交高质量的关键词,将本体作为等级化组织词汇的架构引入到初始词构建过程。对所有词汇按在当前领域中出现频率高低进行分类,并根据搜索引擎返回接口集元素数量进行二次分类,确保关键词是对发现数据源查询接口贡献较大的词汇。在不同领域上的测试结果表明,该方法能发现相当数量的查询接口,从而验证其有效性。
关键词:
数据源发现,
深层网,
本体
Abstract: This paper proposes a method for the data source discovery using the search engine. In order to submit high quality key words to the search engine, the paper introduces the ontology to the initial word construction process, classifies all the words according to their frequency in the current domain, and reclassifies these words in accordance with the element quantity of the returned collection, ensures that the key word contributes greatly to the discovery of the data source query interface. Test results in different domains show that the approach proposed can discover a large amount of query interfaces, and its validty is verified.
Key words:
data source discovery,
Deep Web,
ontology
中图分类号:
王海龙, 胡景芝, 赵朋朋, 崔志明. 基于搜索引擎的Deep Web数据源发现[J]. 计算机工程, 2011, 37(5): 77-79,82.
WANG Hai-Long, HU Jing-Zhi, DIAO Peng-Peng, CUI Zhi-Meng. Deep Web Data Source Discovery Based on Search Engine[J]. Computer Engineering, 2011, 37(5): 77-79,82.