计算机工程 ›› 2006, Vol. 32 ›› Issue (23): 52-54.doi: 10.3969/j.issn.1000-3428.2006.23.019

• 软件技术与数据库 • 上一篇    下一篇

基于规则引擎的数据清洗

叶 舟,王 东   

  1. (上海交通大学软件学院,上海 200030)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-12-05 发布日期:2006-12-05

Rules Engine Based Data Cleansing

YE Zhou, WANG Dong   

  1. (School of Software, Shanghai Jiaotong University, Shanghai 200030)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-12-05 Published:2006-12-05

摘要: 以往的数据清洗研究存在以下缺陷:检测和修复动作要么使用灵活性差的硬编码,要么依靠灵活却低效的人工判断。该文提出了一个使用规则来描述清洗逻辑,使用规则引擎来执行清洗逻辑,从而能够处理各种数据质量问题的数据清洗架构REBDCA,解决了该问题。展示了REBDCA和一个ETL工具的集成,测试了REBDCA的性能,并和用硬编码完成相同逻辑的方案进行了性能对比。

关键词: 规则引擎, 数据清洗, 抽取-转换-装载

Abstract: Previous researches on data cleansing use inflexible hardcode or inefficient manual work to detect and repair data quality problems. This paper presents a rules engine based data cleansing architecture(REBDCA) to solve this problem. REBDCA uses rules to describe data cleansing logic and then uses rules engine to excute it. An integration of REBDCA and ETL tool is presented as an example, and its performance is measured.

Key words: Rules engine, Data cleansing, Extraction-transformation-loading(ETL)