作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (6): 249-257. doi: 10.19678/j.issn.1000-3428.0070294

• 网络空间安全 • 上一篇    下一篇

一种有效提高数据可用性的联邦学习隐私保护算法

曹天涯, 张宇帆*(), 贾俊杰   

  1. 西北师范大学计算机科学与工程学院, 甘肃 兰州 730070
  • 收稿日期:2024-08-28 修回日期:2025-01-06 出版日期:2026-06-15 发布日期:2025-03-05
  • 通讯作者: 张宇帆
  • 作者简介:

    曹天涯, 男, 副教授、博士, 主研方向为密码学

    张宇帆(通信作者), 硕士研究生

    贾俊杰, 副教授、博士

  • 基金资助:
    国家自然科学基金(62362059); 甘肃省自然科学基金(23JRRA686)

A Federal Learning Privacy Protection Algorithm with Effectively Improve Data Availability

CAO Tianya, ZHANG Yufan*(), JIA Junjie   

  1. School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, Gansu, China
  • Received:2024-08-28 Revised:2025-01-06 Online:2026-06-15 Published:2025-03-05
  • Contact: ZHANG Yufan

摘要:

联邦学习在模型聚合过程中存在隐私泄露风险, 以及客户端数据质量不平衡问题, 导致聚合后的模型对各个客户端不公平和服务器端数据聚合不完整, 进而客户端数据可用性低。为此, 提出一种有效提高数据可用性的联邦学习隐私保护算法。该算法首先对客户端的模型参数采用可消除的随机掩码扰动技术, 避免在数据上传服务器过程中出现隐私泄露风险, 同时也不会影响模型的聚合效果; 考虑到各个客户端数据质量的不平衡问题, 在服务器端进行数据聚合时, 动态地调整各个客户端的权重以提高数据可用性。此外, 采用Shamir(t, n)门限秘密共享方法对上传的模型参数进行分发与重构, 防止因网络延迟或者客户端数据上传不成功导致数据聚合结果不完整, 使得客户端的数据可用性下降。在MNIST和CIFAR-10数据集上的实验结果表明, 该算法在保证模型准确率的前提下, 不仅可以防止客户端隐私泄露, 降低算法的时间开销, 而且有效提高了数据可用性, 在实现隐私保护的同时提高了模型性能。

关键词: 联邦学习, 安全聚合, 隐私保护, 动态更新, 秘密共享

Abstract:

To reduce privacy leakage in the model aggregation process of federated learning, an effective federated learning privacy protection algorithm is proposed to improve data availability. This algorithm aims address the unfairness of the aggregated model to each client caused by the imbalance of client data quality and the low data availability caused by incomplete server-side data aggregation. It adopts a removable random mask perturbation technique for the model parameters of the client, avoiding the risk of privacy leakage during data upload to the server without affecting the aggregation effect. Considering the uneven data quality among different clients, it dynamically adjusts the weights of clients during data aggregation on the server side to improve data availability. Simultaneously, the Shamir(t, n) threshold secret sharing method is used to distribute and reconstruct the uploaded model parameters. This prevents incomplete aggregation results caused by network delays or unsuccessful client data uploads, which can lead to a decrease in data availability. Experiments on the MNIST and CIFAR-10 datasets reveal that the proposed algorithm can not only prevents client privacy leakage, reduce algorithm time overhead, and ensure accuracy but also effectively improves data availability and model performance while achieving privacy protection.

Key words: federated learning, secure aggregation, privacy protection, dynamic updates, secret sharing