作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (13): 51-53. doi: 10.3969/j.issn.1000-3428.2010.13.018

• 软件技术与数据库 • 上一篇    下一篇

基于模糊综合评判的相似重复记录识别方法

肖满生1,周浩慧2,王 宏1   

  1. (1. 湖南工业大学科技学院,株洲 412008;2. 长沙商贸旅游职业技术学院,长沙 410004)
  • 出版日期:2010-07-05 发布日期:2010-07-05
  • 作者简介:肖满生(1968-),男,副教授,主研方向:数据库技术,数据挖掘;周浩慧、王 宏,讲师
  • 基金资助:
    湖南省教育厅科研基金资助项目(09C339);湖南省科技计划基金资助项目(2008CK3083)

Identification Method of Approximately Duplicate Records Based on Fuzzy Integrated Estimation

XIAO Man-sheng1, ZHOU Hao-hui2, WANG Hong1   

  1. (1. College of Science and Technology, Hunan University of Technology, Zhuzhou 412008; 2. Changsha Commerce & Tourism College, Changsha 410004)
  • Online:2010-07-05 Published:2010-07-05

摘要: 针对在基于字符串匹配的相似重复记录识别中,属性权值确定主观性太强的问题,提出一种模糊综合评判获取属性权值的方法。采用多用户对各属性的重要性组成因素进行等级评价,通过模糊映射获得反映属性重要性的权值,并以此为基础进行相似重复记录识别。理论分析和实验表明,该方法能客观地获取各属性权值,因而在相似重复记录识别中有较高的识别精度。

关键词: 模糊综合评判, 相似重复记录, 属性权值, 相似度

Abstract: Aiming at the problem of very strong subjectivity in the attribute weight determination of dataset in identifying approximately duplicate records based on the character string matching method, the paper puts forward a method based on fuzzy integrated estimation to get attribute weight. It estimates the components of all attribute’s importance by multi users, and gets the attribute’s weight through fuzzy mapping, based on which the approximately duplicate records are identified. It can be proved from theory and practice that the method can objectively get all attribute weight, thus it has a higher precision in identifying approximately duplicate records.

Key words: fuzzy integrated estimation, approximately duplicate records, attribute weight, similarity

中图分类号: