计算机工程 ›› 2008, Vol. 34 ›› Issue (13): 51-53.doi: 10.3969/j.issn.1000-3428.2008.13.019

• 软件技术与数据库 • 上一篇    下一篇

基于规则集的Deep Web信息检索

杨巨峰1,史广顺1,赵玉娟1,2,王庆人1   

  1. (1. 南开大学机器智能研究所,天津 300071;2. 天津市气象信息中心,天津 300074)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-07-05 发布日期:2008-07-05

Rules-based Deep Web Information Retrieval

YANG Ju-feng1, SHI Guang-shun1, ZHAO Yu-juan1,2, WANG Qing-ren1   

  1. (1. Institute of Machine Intelligence, Nankai University, Tianjin 300071; 2. Tianjin Meteorological Information Center, Tianjin 300074)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-07-05 Published:2008-07-05

摘要: 提出一种基于规则集的新型Deep Web信息检索模型。该模型包含4个层次,主要处理环节如任务分派、信息提取、数据清洗等引入了Deep Web特有的结构规则、逻辑规则和应用规则协助工作。把该模型应用于科技文献检索、电子机票定购和工作简历搜索3个领域,实验结果证明该模型灵活、可信,有效信息查全率达到96%以上。

关键词: 信息检索, 深层网络, 规则集, 数据提取

Abstract: This paper proposes a novel rules-based model to extract data from Deep Web pages. The model comprises four layers, main processing parts as task allocation, information extraction, data cleaning which work based on the rules of structure, logic and application. It applies the new model to three intelligent system, scientific paper retrieval, electronic ticket ordering and resume searching. Experimental results show that the proposed method is robust and feasible.

Key words: information retrieval, Deep Web, rules set, data extraction

中图分类号: