计算机工程 ›› 2009, Vol. 35 ›› Issue (4): 66-68.doi: 10.3969/j.issn.1000-3428.2009.04.023

• 软件技术与数据库 • 上一篇    下一篇

改进的XML智能数据清洗策略

翟学敏1,刘 渊1,2,刘 波3,毕蓉蓉1   

  1. (1. 江南大学信息工程学院数字媒体创意中心,无锡 214122;2. 南京理工大学计算机学院,南京 210094;3. 中南大学信息学院,长沙 410083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-02-20 发布日期:2009-02-20

Improved XML Intelligence Data Cleaning Strategy

ZHAI Xue-min1, LIU Yuan1,2, LIU Bo3, BI Rong-rong1   

  1. (1. Digital Media Creative Center, College of Information Engineering, Southern Yangtze University, Wuxi 214122;2. School of Computer, Nanjing University of Science & Technology, Nanjing 210094;3. College of Information, Central South University, Changsha 410083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-02-20 Published:2009-02-20

摘要: 针对XML数据的质量问题,以XML键为基础,借助多模板隐马尔可夫模型信息抽取策略与粒子群优化算法构建新的XML数据清洗方法。为了提高XML相似性数据并行检测效率,利用波函数对粒子群优化算法进行优化。仿真实验表明,与其他XML数据清洗算法相比,该方法的自适应学习能力强、人工参与程度低、计算量小,时间性能有94%左右的提升。

关键词: XML文档集, XML键, 粒子群优化算法, 数据清洗, 隐马尔可夫模型

Abstract: Aiming at the quality of XML data, this paper proposes a new XML data cleaning method based on XML key, the information of multiple templates Hidden Markov Model(HMM) draw-out strategy and Particle Swarm Optimization(PSO). For boosting the parallel detection efficiency of the XML similarity records, a wave function is used to give relevant improvements to PSO. Contrasted with other XML data cleaning algorithms, simulation experiments show that the optimized algorithm has powerful adaptive learning capability, lower labor cost, less calculation and better time rate around 94%.

Key words: XML document set, XML key, Particle Swarm Optimization(PSO), data cleaning, Hidden Markov Model(HMM)

中图分类号: