计算机工程

• 安全技术 • 上一篇    下一篇

一种集合型数据匿名化的部分删除策略

许信辉,潘 超   

  1. (上海交通大学计算机科学与工程系,上海 200240)
  • 收稿日期:2012-11-22 出版日期:2013-11-15 发布日期:2013-11-13
  • 作者简介:许信辉(1989-),男,硕士研究生,主研方向:数据隐私保护;潘 超,学士

A Partial Deletion Strategy of Set-valued Data Anonymization

XU Xin-hui, PAN Chao   

  1. (Department of Computer Science & Engineering, Shanghai Jiaotong University, Shanghai 200240, China)
  • Received:2012-11-22 Online:2013-11-15 Published:2013-11-13

摘要: 针对集合型数据发布下的隐私保护问题,提出一种多轮迭代式的部分删除策略。该策略不假设数据接收者的使用场景,也不限制关联规则的先验知识数目,在减少信息损失的同时,保护可挖掘的安全强关联规则,避免匿名化后数据中出现关于敏感信息的强关联规则。实验结果表明,相比于经典的泛化和整体删除策略,该策略平均可减少30%左右的信息损失,并保持至少25%原有的安全强关联规则,体现了其优越性。

关键词: 数据匿名化, 部分删除, 整体删除, 泛化, 集合型数据, 信息损失

Abstract: Privacy-preserving under set-valued data publishing is an important problem. Aiming at this problem, this paper presents an iterative strategy that anonymizes set-valued data through partial deletion strategy. This strategy ensures that no strong inferences of sensitive information are possible regardless of the amount of background knowledge the attacker possesses, while making no particular assumption of the downstream utility of the data. It attempts to retain as many mineable useful association rules as possible in the anonymized data, while minimizing the item deletions. Experimental result shows that partial deletion significantly outperforms generalization and global deletion, two of the existing popular anonymization techniques, reducing the number of deletions by 30% on average and retaining 25% more rules.

Key words: data anonymization, partial deletion, global deletion, generalization, set-valued data, information loss

中图分类号: