作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (8): 107-109,203.

• 软件技术与数据库 • 上一篇    下一篇

基于一种可扩展函数族的聚类算法

李 岱 1,邓绪斌1,朱扬勇1,2   

  1. 1. 复旦大学计算机与信息技术系,上海 200433;2. 上海生物信息技术研究中心,上海 201203
  • 出版日期:2006-04-20 发布日期:2006-04-20

VI-DE: A Visual Editing and Debugging Environment for DE-Wrapper

LI Dai1, DENG Xubin1, ZHU Yangyong 1,2   

  1. 1. Department of Computer and Information Technology, Fudan University, Shanghai 200433;2. Shanghai Center for Bioinformation Technology, Shanghai 201203
  • Online:2006-04-20 Published:2006-04-20

摘要: 介绍了VI-DE,数据抽取工具DE-Wrapper 的可视化编辑调试环境。DE-Wrapper 使用扩展正则表达式(ERE)描述数据源结构,根据该ERE 构造数据抽取树(DE-树),然后根据DE-树生成相应的关系数据库模式,最后抽取数据。VI-DE 整合了DE-Wrapper 的工作流程。该工具首先使用可视化界面支持ERE/DE-树可视化构建,然后自动检查该ERE/DE-树是否具有二义性,最后在样本数据上运行抽取算法并给出数据库结构和抽取结果,供用户进行评价,从而逐步引导用户设计出满足要求的ERE/DE-树。VI-DE 已用于构建国内第1 个整合的生物信息在线数据仓库系统。

关键词: 数据抽取;扩展正则表达式;DE-Wrapper;DE-树

Abstract: This paper describes VI-DE, a visual editing and debugging environment for DE-Wrapper. DE-Wrapper is a tool developed for solving data extraction problems, which describes the structure of data source with the extended regular expressions(ERE) and creates the data extraction tree (DE-Tree) according to the ERE, then generates the relational tables and extracts data finally. VI-DE unifies the working process of DE-Wrapper. Firstly, VI-DE enables the visual-construction for DE-tree. Secondly, it automatically detects the ambiguity of ERE. Thirdly, the tool runs the data pre-extraction and shows the relational tables created and the result of data extraction using GUI. At last, it runs the data extraction. It helps a user design the ERE/DE-tree that satisfies with the requirement step by step. VI-DE has been applied to build the first online integrated biological data warehouse of China.

Key words: Data extraction; Extended regular expressions; DE-Wrapper; DE-Tree