计算机工程 ›› 2012, Vol. 38 ›› Issue (23): 47-50.doi: 10.3969/j.issn.1000-3428.2012.23.011

• 软件技术与数据库 • 上一篇    下一篇

基于逆向清理的实时异构数据整合模型研究

唐 钰1,陈 浩1,叶柏龙2   

  1. (1. 湖南大学信息科学与工程学院,长沙 410082;2. 中南大学土木建筑学院,长沙 410083)
  • 收稿日期:2012-03-12 出版日期:2012-12-05 发布日期:2012-12-03
  • 作者简介:唐 钰(1987-),女,硕士研究生,主研方向:Web数据挖掘;陈 浩,副教授、博士;叶柏龙,教授
  • 基金项目:
    国家自然科学基金资助项目(61070194);国家创新基金资助项目(11C26214305383)

Study of Real-time Heterogeneous Data Integration Model Based on Reverse Cleaning

TANG Yu 1, CHEN Hao 1, YE Bai-long 2   

  1. (1. School of Information Science and Engineering, Hunan University, Changsha 410082, China; 2. School of Civil Engineering and Architecture, Central South University, Changsha 410083, China)
  • Received:2012-03-12 Online:2012-12-05 Published:2012-12-03

摘要: 为解决异构数据整合过程中数据源本身的质量及目标数据的实时更新问题,在适配器、XML和逆向清理等技术的基础上,提出一种基于逆向清理的异构数据整合模型。从两方面对异构数据进行处理,一方面利用实时线程对新增或修改的原始数据进行抽取、清洗并保存,达到数据的实时更新,另一方面利用平台上或整合后的有效数据,采用逆向清理过程反向修复原始数据中的错误和缺失。实验结果证明,该模型能同时提高原始数据和目标数据的质量。

关键词: 异构数据, 数据整合, 逆向清理, ETL过程, 适配器, 数据质量

Abstract: In order to solve the problems of target data updated in real time and the quality of data source itself in the process of heterogeneous data integration, on the basis of the adapter, the XML and reverse data cleaning technology, a real-time heterogeneous data integration model based on reverse data cleaning is presented. It processes heterogeneous data in major two ways. On the one hand, it uses real-time threads to extract, clean and save the original data that is newly increased or modified. On the other hand, it uses the reverse cleaning process reverse to fix errors and missing in the original data by the valid data in platform or integration. Experimental result shows that the model can improve the data quality of the target data and the original data simultaneously.

Key words: heterogonous data, data integration, reverse cleaning, Extract, Transform, Load(ETL) process, adapter, data quality

中图分类号: