Abstract:
This paper proposes a lineage tracing frame for Extraction-Transform-Load(ETL) process, and discusses some key issues in this system, such as classifying the transformations, the designing of metadata, constructing transformation series, tracing process design and the tracing methods for every type of the transformation. The metadata for tracing is injected into the package file, these information are extracted when tracing query were proposed to support data tracing.
Key words:
data provenance,
provenance management system,
Extraction-Transform-Load(ETL),
synchronous/asynchronous transformation
摘要: 提出一种面向提取-转换-加载(ETL)过程的数据起源追踪系统,讨论实现的关键技术,包括转换分类、元数据设计、转换序列构建、追踪流程设计以及不同转换的追踪方法。系统将追踪所需的元数据设计在包文件结构中,在逆向追踪时抽取元数据进行相关处理,构建各个层次的转换起源信息图,从而实现数据起源的追踪。
关键词:
数据起源,
起源管理系统,
提取-转换-加载,
同步/异步转换
CLC Number:
DAI Chao-Fan, WANG Chao. Data Provenance Tracing System for Extraction-Transform-Load[J]. Computer Engineering, 2011, 37(17): 256-258,261.
戴超凡, 王涛. 面向ETL的数据起源追踪系统[J]. 计算机工程, 2011, 37(17): 256-258,261.