作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 124-131. doi: 10.19678/j.issn.1000-3428.0056429

• 先进计算与数据处理 • 上一篇    下一篇

基于Petri网的数据清洗规则链自动组合与检测

何俊1, 张云飞1, 张德海2   

  1. 1. 昆明学院 信息工程学院, 昆明 650214;
    2. 云南大学 软件学院, 昆明 650206
  • 收稿日期:2019-10-28 修回日期:2019-12-01 发布日期:2019-12-09
  • 作者简介:何俊(1977-),男,副教授、博士,主研方向为数据分析;张云飞,讲师、硕士;张德海,副教授、博士。
  • 基金资助:
    国家自然科学基金(61263043,61864004);云南省地方本科高校基础研究联合专项(2017FH001-05)。

Automatic Combination and Detection of Data Cleaning Rule Chains Based on Petri Net

HE Jun1, ZHANG Yunfei1, ZHANG Dehai2   

  1. 1. College of Information Engineering, Kunming University, Kunming 650214, China;
    2. College of Software, Yunnan University, Kunming 650206, China
  • Received:2019-10-28 Revised:2019-12-01 Published:2019-12-09

摘要: 针对传统规则链顺序执行方法面向大规模数据清洗任务时存在的规则冗余和逻辑冲突问题,提出一种规则链自动组合与检测方法。结合上下文信息设计通用、领域和自定义的三层规则库,基于Petri网建立规则链组合模型(RCCM),实现规则链自动生成、逻辑正确性与状态可达性检测以及规则链优选。以某地区扶贫领域的数据清洗应用为例,通过RCCM模型实现的实验结果表明,该方法能明显减少错误数据的产生,提高数据清洗质量和效率。

关键词: 数据清洗, 分层规则库, 规则链, Petri网, 逻辑冲突

Abstract: To address the rule redundancy and logical conflicts of the sequential execution method of traditional rule chains applied to massive Data Cleaning(DC) task,this paper proposes an automatic combination and detection method for rule chains.A three-layer rule base including general,field-specific and customized layers is designed based on the context information.Then a Rule Chain Combination Model(RCCM) is established based on Petri Net(PN) to realize the automatic generation of rule chains,the detection of logical correctness and state accessibility,as well as the optimization of rule chains.The proposed method takes the DC application in the field of poverty alleviation in a certain area as an example.Experimental results on RCCM implementation show that the proposed method can significantly reduce generated error data and improves the quality and efficiency of DC.

Key words: Data Cleaning(DC), hierarchical rule base, rule chain, Petri Net(PN), logical conflict

中图分类号: