作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (5): 136-144. doi: 10.19678/j.issn.1000-3428.0062382

• 网络空间安全 • 上一篇    下一篇

基于本地差分隐私的空间数据自适应划分算法

金媛媛1,2, 倪志伟1,2, 朱旭辉1,2, 陈恒恒1,2, 陈千1,2   

  1. 1. 合肥工业大学 管理学院, 合肥 230009;
    2. 合肥工业大学 过程优化与智能决策教育部重点实验室, 合肥 230009
  • 收稿日期:2021-08-17 修回日期:2021-09-30 发布日期:2022-05-10
  • 作者简介:金媛媛(1998—),女,硕士研究生,主研方向为信息安全、数据管理;倪志伟(通信作者),教授、博士;朱旭辉,讲师、博士;陈恒恒,硕士;陈千,博士。
  • 基金资助:
    国家自然科学基金“大数据环境下协同商务智能构建中的关键技术研究”(91546108);安徽省科技重大专项“面向混合云数据中心的隐私安全防护关键技术及其虚拟化设备研发”(201903a05020020)。

Spatial Data Adaptive Partition Algorithm Based on Local Differential Privacy

JIN Yuanyuan1,2, NI Zhiwei1,2, ZHU Xuhui1,2, CHEN Hengheng1,2, CHEN Qian1,2   

  1. 1. School of Management, Hefei University of Technology, Hefei 230009, China;
    2. Key Laboratory of Process Optimization and Intelligent Decision-Making, Ministry of Education, Hefei University of Technology, Hefei 230009, China
  • Received:2021-08-17 Revised:2021-09-30 Published:2022-05-10

摘要: 空间位置数据分布通常具有不均匀性,不同位置区域的密度差异较大,在本地差分隐私模型中无法直接获取用户真实的位置数据,使得空间位置划分方法受到限制以及数据发布存在查询精度低、通信代价大等问题。为在本地差分隐私模型下的大规模空间数据采集和发布过程中进行空间划分,提出一种空间数据分层自适应划分算法KDG-HT。通过收集部分用户的数据来初步获取区域的分布情况,采用KD-树的思想划分区域,并利用抽样技术对用户进行分组,根据分组用户统计结果所提供的先验知识来完成多层细粒度划分。在此基础上,结合差分隐私模型的并行组合特性分层扰动用户数据,从总体上实现发布数据的ε-差分隐私保护。实验结果表明,KDG-HT算法适用于具有不同数据分布情况的大规模空间数据集,查询精度及运行效率优于RAPPOR、UG、GT-R等算法,其中与GT-R算法相比,KDG-HT算法发布数据的查询精度最高提升3倍,运行效率提高17%。

关键词: 本地差分隐私, 空间自适应划分, 用户随机采样, 空间范围查询, 随机响应

Abstract: The distribution of spatial location data is typically uneven, and the density of different location areas varies significantly.In the local differential privacy model, the actual location data of users cannot be obtained directly, which limits the spatial location segmentation method.Additionally, problems exist in data publishing, such as low query accuracy and high communication cost.To segment a space during large-scale spatial data acquisition and distribution using the local differential privacy model, a hierarchical adaptive spatial data partition algorithm, KDG-HT, is proposed.By acquiring the data of some users to preliminarily obtain the distribution of a region, the region can be segmented using the concept of the KD-tree, and the users can be categorized via sampling technology.By applying prior knowledge provided by the statistical results of the grouped users, the multilayer fine-grained segmentation is completed.Subsequently, combined with the parallel combination characteristics of the differential privacy model, the user data are disturbed hierarchically, and the publishing data are realized as a complete ε-differential privacy protection.Experimental results show that the KDG-HT algorithm is suitable for large-scale spatial datasets with different data distributions, and that its query accuracy and operation efficiency are better than those of RAPPOR, UG, GT-R, and other algorithms.Compared with the GT-R algorithm, the KDG-HT algorithm can improve the query accuracy of published data by up to three times and the operation efficiency by 17%.

Key words: local differential privacy, spatial adaptive partition, user random sampling, spatial range query, randomized response

中图分类号: