摘要: 针对XML数据质量问题,通过引入贝叶斯学习方法与马尔可夫链概率转移策略建立XML数据清洗过程的元数据模型,根据综合清洗结构化数据中相似重复记录的思想,提出一种智能清洗XML数据的新方法。实验表明,与其他方法比较,该方法不仅自动化程度较高,降低人工参与的程度,而且精确率和查全率提升了2%~5%。
关键词:
XML数据库,
数据清洗,
贝叶斯公式,
马尔可夫链
Abstract: Aiming at the quality of XML data, this paper introduces a metadata model based on XML cleaning data through Bayes learning method and Markoff chain probabilistic strategy, and designs a new intelligence method how to clean XML data by the idea which can clean the similarity duplicated records. Compared with other methods, experimental results show that it not only has a high automatization and a low manual working, but also has a better precision and its recall rates between 2% and 5%.
Key words:
XML database,
data cleaning,
Bayes formula,
Markoff chain
中图分类号:
刘 波;杨路明;雷刚跃;邓云龙. 面向XML数据库的智能数据清洗策略[J]. 计算机工程, 2008, 34(16): 16-18.
LIU Bo; YANG Lu-ming; LEI Gang-yue; DENG Yun-long. Intelligence Data Cleaning Strategy for XML Database[J]. Computer Engineering, 2008, 34(16): 16-18.