作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 143-147. doi: 10.19678/j.issn.1000-3428.0061707

• 网络空间安全 • 上一篇    下一篇

PPDM中面向k-匿名的MI Loss评估模型

谷青竹, 董红斌   

  1. 武汉大学 国家网络安全学院, 武汉 430000
  • 收稿日期:2021-05-20 修回日期:2021-07-10 发布日期:2022-04-14
  • 作者简介:谷青竹(1997—),女,硕士研究生,主研方向为隐私保护数据挖掘;董红斌,教授。
  • 基金资助:
    国家自然科学基金“计算机免疫智能的连续免疫应答机制及其应用研究”(61877045)。

MI Loss Evaluation Model for k-Anonymity in PPDM

GU Qingzhu, DONG Hongbin   

  1. School of Cyber Science and Engineering, Wuhan University, Wuhan 430000, China
  • Received:2021-05-20 Revised:2021-07-10 Published:2022-04-14

摘要: 隐私保护数据挖掘(PPDM)利用匿名化等方法使数据所有者在不泄露隐私信息的前提下,安全发布在数据挖掘中有效可用的数据集。k-匿名算法作为PPDM研究使用最广泛的算法之一,具有计算开销低、数据形变小、能抵御链接攻击等优点,但是在一些k-匿名算法研究中使用的数据可用性评估模型的权重设置不合理,导致算法选择的最优匿名数据集在后续的分类问题中分类准确率较低。提出一种使用互信息计算权重的互信息损失(MI Loss)评估模型。互信息反映变量间的关联关系,MI Loss评估模型根据准标识符和标签之间的互信息计算权重,并通过Loss公式得到各个准标识符的信息损失,将加权后的准标识符信息损失的和作为数据集的信息损失,以弥补评估模型的缺陷。实验结果证明,运用MI Loss评估模型指导k-匿名算法能够明显降低匿名数据集在后续分类中的可用性丢失,相较于Loss模型和Entropy Loss模型,该模型分类准确率提升了0.73%~3.00%。

关键词: 隐私保护数据挖掘, k-匿名算法, 数据可用性, 分类准确率, MI Loss评估模型

Abstract: Privacy Preserving Data Mining(PPDM) uses methods such as anonymization to allow data owners to safely publish data sets that are effectively available in data mining without revealing private information.The k-anonymity algorithm, one of the most widely used algorithms in PPDM research, has the advantages of low computational overhead, small data deformation, and resistance to link attacks.However, in some studies on k-anonymity algorithms, the weight settings of the data utility evaluation model used by the algorithm are unreasonable, which leads to the low classification accuracy of the optimal anonymous data set selected by the algorithm.Mutual Information (MI) reflects the relationship between variables.The MI Loss evaluation model uses the mutual information between the quasi-identifier and the label to calculate the weight.The information loss of each quasi-identifier is obtained through the Loss formula, and the sum of all weighted quasi-identifier information losses is taken as the information loss of the data set, which makes up for the shortcomings of the existing evaluation model.Experiments show that using the MI Loss evaluation model to guide the k-anonymity algorithm can significantly reduce the utility loss of anonymous data sets in subsequent classification problems.The classification accuracy of the proposed model shows an improvement of 0.73%~3.00% compared with the accuracies of the Loss and Entropy Loss models.

Key words: Privacy Preserving Data Mining(PPDM), k-anonymity algorithm, data utility, classification accuracy, MI Loss evaluation model

中图分类号: