计算机工程 ›› 2009, Vol. 35 ›› Issue (21): 85-87.doi: 10.3969/j.issn.1000-3428.2009.21.028

• 软件技术与数据库 • 上一篇    下一篇

一种相似重复元数据记录检测方法

王常武,韩菁华,张付志   

  1. (燕山大学信息科学与工程学院,秦皇岛 066004)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-11-05 发布日期:2009-11-05

Method for Approximately Duplicate Metadata Record Detection

WANG Chang-wu, HAN Jing-hua, ZHANG Fu-zhi   

  1. (College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-11-05 Published:2009-11-05

摘要: 对联邦数字图书馆中重复元数据记录进行检测和管理,是保证元数据质量、提高联邦检索服务质量的关键。针对现有联邦数字图书馆中重复记录检测方法计算集中、准确度不高等缺点,提出一种快速高效的相似重复元数据记录检测方法,该方法基于改进的N-Gram方法,适合较大规模联邦数字图书馆。模拟实验结果表明,该方法能有效提高重复检测的性能,加快重复检测的速度。

关键词: 元数据, 重复记录检测, N-Gram方法, 相似度

Abstract: Metadata records duplicate detection and management of federated digital library are one of key issues to ensure metadata quality and improve federal retrieval services. Many duplicate record detection methods exist for conventional federated digital library, but they are computationally intensive and low accuracy and so on. This paper proposes an efficient duplication approach for a relatively large federated digital library based on improved N-Gram method. Simulation experimental results show that the method improve the performance of duplicate detection effectively, accelerate the rate of duplicate detection.

Key words: metadata, duplicate record detection, N-Gram method, similarity

中图分类号: