作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 154-164. doi: 10.19678/j.issn.1000-3428.0067800

• 网络空间安全 • 上一篇    下一篇

基于非定长编码和滑动窗口的隐私保护记录链接方法

叶晓东1,2, 赵迎迎1,2, 孙永奇1,2,*(), 赵思聪3, 刘真1,2   

  1. 1. 北京交通大学计算机与信息技术学院, 北京 100044
    2. 交通大数据与人工智能教育部重点实验室, 北京 100044
    3. 北京航天晨信科技有限责任公司, 北京 102308
  • 收稿日期:2023-06-05 出版日期:2024-02-15 发布日期:2023-09-05
  • 通讯作者: 孙永奇
  • 基金资助:
    科技创新2030—“新一代人工智能”重大项目(2021ZD0113002)

Privacy-Preserving Record Linkage Method Based on Variable-Length Coding and Sliding Window

Xiaodong YE1,2, Yingying ZHAO1,2, Yongqi SUN1,2,*(), Sicong ZHAO3, Zhen LIU1,2   

  1. 1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
    2. Key Laboratory of Big Data and Artificial Intelligence in Transportation, Ministry of Education, Beijing 100044, China
    3. Beijing Aerocim Technology Co., Ltd., Beijing 102308, China
  • Received:2023-06-05 Online:2024-02-15 Published:2023-09-05
  • Contact: Yongqi SUN

摘要:

隐私保护记录链接(PPRL)是一种跨不同数据库高效识别同一实体对象对应的记录而不泄露记录所代表实体对象的敏感或机密信息的方法。布隆过滤器(BF)广泛应用于PPRL,其将记录中的敏感信息进行编码并使用字符q-gram实现近似匹配。但是,BF编码容易遭受密码分析攻击,且由于对q-gram位置不敏感,会导致记录匹配的精确率较低。提出一种基于非定长编码和滑动窗口的PPRL方法,其采用的非定长编码记录生成方式不仅使记录具有位置敏感性,而且通过对有效位前后添加随机位数组隐藏了实体的位数组频率信息,从而能够有效防御频率攻击。此外,设计一种基于滑动窗口的记录链接方式,先通过快速过滤筛除大量不匹配的记录,再使用双向滑动窗口的精确匹配策略对剩余记录进行匹配,提高隐私保护记录的匹配效率。在公开数据集上的实验结果表明,相比BF方法,该方法在编码速度上快100倍左右,其同时具有更高的匹配精度,在跨数据库PPRL方面的安全性也更强。

关键词: 布隆过滤器, 字符串比较, 隐私保护, 记录链接, 安全实体对齐

Abstract:

Privacy-Preserving Record Linkage(PPRL) refers to the efficient identification of records corresponding to the same entity object across different databases without revealing sensitive or confidential information represented by the records. Bloom Filter(BF) is a widely used technique in PPRL, which encodes sensitive information in records and uses q-gram for approximate matching. However, BF encoding is vulnerable to cryptanalysis attacks, and its insensitivity to the q-gram position can result in a decrease in the precision of record matching. This study proposes a PPRL method based on variable-length coding and sliding window techniques. The method for generating the variable-length encoding record used in the method not only makes the record position-sensitive but also hides the frequency information of entity bit arrays by adding random bit arrays before and after the effective bits. This effectively defends against frequency attacks. In addition, a record linkage method based on sliding windows is designed, which first filters out a large number of non-matching records through a fast filter and then uses a bidirectional sliding window exact-matching strategy to match the remaining records. This improves the matching efficiency of the privacy-preserving records. The experimental results on public datasets show that the proposed method is approximately 100 times faster in encoding the speed than the BF method and has higher matching accuracy. It also has stronger security in cross-database PPRL.

Key words: Bloom Filter(BF), string comparison, privacy protection, record linkage, secure entity alignment