一种有效提高数据可用性的联邦学习隐私保护算法

doi:10.19678/j.issn.1000-3428.0070294

摘要/Abstract

摘要：

联邦学习在模型聚合过程中存在隐私泄露风险, 以及客户端数据质量不平衡问题, 导致聚合后的模型对各个客户端不公平和服务器端数据聚合不完整, 进而客户端数据可用性低。为此, 提出一种有效提高数据可用性的联邦学习隐私保护算法。该算法首先对客户端的模型参数采用可消除的随机掩码扰动技术, 避免在数据上传服务器过程中出现隐私泄露风险, 同时也不会影响模型的聚合效果; 考虑到各个客户端数据质量的不平衡问题, 在服务器端进行数据聚合时, 动态地调整各个客户端的权重以提高数据可用性。此外, 采用Shamir(t, n)门限秘密共享方法对上传的模型参数进行分发与重构, 防止因网络延迟或者客户端数据上传不成功导致数据聚合结果不完整, 使得客户端的数据可用性下降。在MNIST和CIFAR-10数据集上的实验结果表明, 该算法在保证模型准确率的前提下, 不仅可以防止客户端隐私泄露, 降低算法的时间开销, 而且有效提高了数据可用性, 在实现隐私保护的同时提高了模型性能。

关键词: 联邦学习, 安全聚合, 隐私保护, 动态更新, 秘密共享

Abstract:

To reduce privacy leakage in the model aggregation process of federated learning, an effective federated learning privacy protection algorithm is proposed to improve data availability. This algorithm aims address the unfairness of the aggregated model to each client caused by the imbalance of client data quality and the low data availability caused by incomplete server-side data aggregation. It adopts a removable random mask perturbation technique for the model parameters of the client, avoiding the risk of privacy leakage during data upload to the server without affecting the aggregation effect. Considering the uneven data quality among different clients, it dynamically adjusts the weights of clients during data aggregation on the server side to improve data availability. Simultaneously, the Shamir(t, n) threshold secret sharing method is used to distribute and reconstruct the uploaded model parameters. This prevents incomplete aggregation results caused by network delays or unsuccessful client data uploads, which can lead to a decrease in data availability. Experiments on the MNIST and CIFAR-10 datasets reveal that the proposed algorithm can not only prevents client privacy leakage, reduce algorithm time overhead, and ensure accuracy but also effectively improves data availability and model performance while achieving privacy protection.

Key words: federated learning, secure aggregation, privacy protection, dynamic updates, secret sharing

曹天涯, 张宇帆, 贾俊杰. 一种有效提高数据可用性的联邦学习隐私保护算法[J]. 计算机工程, 2026, 52(6): 249-257.

CAO Tianya, ZHANG Yufan, JIA Junjie. A Federal Learning Privacy Protection Algorithm with Effectively Improve Data Availability[J]. Computer Engineering, 2026, 52(6): 249-257.

https://www.ecice06.com/CN/Y2026/V52/I6/249

图/表 8

图1 IDEFed整体框架

Fig.1 The overall framework of IDEFed

图2 随机掩码扰动流程

Fig.2 Procedure of random mask perturbation

图3 不同分布下算法的时间开销对比

Fig.3 Comparison of time cost of algorithms under different distributions

图4 不同客户端数量下的时间开销

Fig.4 Time overhead under different number of clients

图5 不同网络模型下的时间开销对比

Fig.5 Comparison of time cost under different network models

图6 在MNIST数据集上不同算法的准确率比较

Fig.6 Comparison of accuracy among different algorithms on the MNIST dataset

图7 在CIFAR-10数据集上不同算法的准确率比较

Fig.7 Comparison of accuracy among different algorithms on the CIFAR-10 dataset

图8 不同数据分布下的准确率

Fig.8 Accuracy under different data distributions

参考文献 26

1	ROH Y , HEO G , WHANG S E . A survey on data collection for machine learning: a big data-AI integration perspective. IEEE Transactions on Knowledge and Data Engineering, 2021, 33 (4): 1328- 1347. doi: 10.1109/TKDE.2019.2946162
2	MCMAHAN H B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[EB/OL]. [2024-07-20]. https://arxiv.org/pdf/1602.05629.
3	LI Q B , WEN Z Y , WU Z M , et al. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (4): 3347- 3366. doi: 10.1109/TKDE.2021.3124599
4	WANG X D , GARG S , LIN H , et al. Toward accurate anomaly detection in industrial Internet of Things using hierarchical federated learning. IEEE Internet of Things Journal, 2022, 9 (10): 7110- 7119. doi: 10.1109/JIOT.2021.3074382
5	PHONG L T , AONO Y , HAYASHI T , et al. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security, 2018, 13 (5): 1333- 1345. doi: 10.1109/TIFS.2017.2787987
6	HAO M, LI H W, XU G W, et al. Towards efficient and privacy-preserving federated deep learning[C]//Proceedings of IEEE International Conference on Communications (ICC). Shanghai, China: IEEE Press, 2019: 1-6.
7	YAO Y H, JIN W Z, RAVI S, et al. FedGCN: convergence-communication tradeoffs in federated training of graph convolutional networks[EB/OL]. [2024-07-20]. https://arxiv.org/abs/2201.12433v5.
8	ZHANG C, LI S, XIA J, et al. BatchCrypt: efficient homomorphic encryption for Cross-Silo federated learning[C]//Proceedings of 2020 USENIX Annual Technical Conference. New York, USA: [s. n], 2020: 493-506.
9	DWORK C , MCSHERRY F , NISSIM K , et al. Calibrating noise to sensitivity in private data analysis. Berlin, Germany: Springer, 2006: 265- 284.
10	BONAWITZ K, IVANOV V, KREUTER B, et al. Practical secure aggregation for privacy-preserving machine learning[C]//Proceedings of ACM SIGSAC Conference on Computer and Communications Security. New York, USA: ACM Press, 2017: 1175-1191.
11	WANG T , MEI Y X , JIA W J , et al. Edge-based differential privacy computing for sensor-cloud systems. Journal of Parallel and Distributed Computing, 2020, 136, 75- 85. doi: 10.1016/j.jpdc.2019.10.009
12	张星, 张兴, 王晴阳. DP-IMKP: 满足个性化差分隐私的数据发布保护方法. 计算机工程与应用, 2023, 59 (10): 288- 298.
	ZHANG X , ZHANG X , WANG Q Y . DP-IMKP: data publishing protection method for personalized differential privacy. Computer Engineering and Applications, 2023, 59 (10): 288- 298.
13	TRUEX S, BARACALDO N, ANWAR A, et al. A hybrid approach to privacy-preserving federated learning[C]//Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. New York, USA: ACM Press, 2019: 1-11.
14	LING J , ZHENG J C , CHEN J H . Efficient federated learning privacy preservation method with heterogeneous differential privacy. Computers & Security, 2024, 139, 103715.
15	卢晓天, 朴春慧, 杨兴雨, 等. 基于贝叶斯网络的差分隐私高维数据发布技术研究. 计算机工程, 2024, 50 (5): 167- 181. doi: 10.19678/j.issn.1000-3428.0067967
	LU X T , PIAO C H , YANG X Y , et al. Research on differential privacy high dimensional data publishing technology based on bayesian networks. Computer Engineering, 2024, 50 (5): 167- 181. doi: 10.19678/j.issn.1000-3428.0067967
16	XU G W , LI H W , LIU S , et al. VerifyNet: secure and verifiable federated learning. IEEE Transactions on Information Forensics and Security, 2020, 15, 911- 926. doi: 10.1109/TIFS.2019.2929409
17	LI T , SAHU A K , TALWALKAR A , et al. Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine, 2020, 37 (3): 50- 60. doi: 10.1109/MSP.2020.2975749
18	YANG Z , ZHOU M , YU H Y , et al. Efficient and secure federated learning with verifiable weighted average aggregation. IEEE Transactions on Network Science and Engineering, 2023, 10 (1): 205- 222. doi: 10.1109/TNSE.2022.3206243
19	ELTARAS T , SABRY F , LABDA W , et al. Efficient verifiable protocol for privacy-preserving aggregation in federated learning. IEEE Transactions on Information Forensics and Security, 2023, 18, 2977- 2990. doi: 10.1109/TIFS.2023.3273914
20	ZHAO C , ZHAO S N , ZHAO M H , et al. Secure multi-party computation: theory, practice and applications. Information Sciences, 2019, 476, 357- 372. doi: 10.1016/j.ins.2018.10.024
21	CANETTI R, FEIGE U, GOLDREICH O, et al. Adaptively secure multi-party computation[C]//Proceedings of the 28th Annual ACM Symposium on Theory of Computing. New York, USA: ACM Press, 1996: 639-648.
22	BYRD D, POLYCHRONIADOU A. Differentially private secure multi-party computation for federated learning in financial applications[C]//Proceedings of the 1st ACM International Conference on AI in Finance. New York, USA: ACM Press, 2020: 1-9.
23	陈律君, 肖迪, 余柱阳, 等. 基于秘密共享和压缩感知的通信高效联邦学习. 计算机研究与发展, 2022, 59 (11): 2395- 2407.
	CHEN L J , XIAO D , YU Z Y , et al. Communication-efficient federated learning based on secret sharing and compressed sensing. Journal of Computer Research and Development, 2022, 59 (11): 2395- 2407.
24	ALISTARH D, GRUBIC D, LI J Z, et al. QSGD: communication efficient SGD via gradient quantization and encoding[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: Curran Associates Inc., 2017: 1707-1718.
25	DENG Y H, LYU F, REN J, et al. FAIR: quality-aware federated learning with precise user incentive and model aggregation[C]//Proceedings of IEEE Conference on Computer Communications. Vancouver, Canada: IEEE Press, 2021: 1-10.
26	HUANG L , YIN Y F , FU Z , et al. LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on ⅡD and non-ⅡD intensive care data. PLoS One, 2020, 15 (4): e0230706. doi: 10.1371/journal.pone.0230706

[1]	李学相, 郑永利, 张怡泽, 段鹏松. 基于机器学习与预训练模型的流量分析方法综述[J]. 计算机工程, 2026, 52(6): 53-67.
[2]	刘海军, 付晓东. 结合旋转自监督和CLIP指导的长尾数据联邦学习[J]. 计算机工程, 2026, 52(5): 129-138.
[3]	李江涛, 马礼, 李阳. 基于大小模型融合的医疗数据分类方法[J]. 计算机工程, 2026, 52(5): 360-370.
[4]	王田, 李果, 梅雅欣, 钟文韬. 传感云与边缘计算综述(特邀)[J]. 计算机工程, 2026, 52(5): 3-42.
[5]	牛淑芬, 王宁, 周旭升, 孔维滢, 陈丽华. 智慧医疗中基于秘密共享和同态加密的安全联邦学习方案[J]. 计算机工程, 2026, 52(4): 302-312.
[6]	尹恒杰, 郑克清, 柯建楠, 董云泉. 基于本地动量加速的非独立同分布联邦学习方法[J]. 计算机工程, 2026, 52(4): 103-110.
[7]	周岳霖, 钟伯成, 王瑞. 基于区块链的轻量级车载自组网条件隐私保护认证[J]. 计算机工程, 2026, 52(3): 201-210.
[8]	陈先意, 糜慧, 何俊杰, 付章杰. 基于结构嵌入的可溯源联邦学习版权保护方法[J]. 计算机工程, 2026, 52(2): 253-264.
[9]	曹天涯, 张雨静, 贾俊杰, 张宇帆, 邓晓飞. 基于个性化梯度裁剪的联邦学习隐私保护算法[J]. 计算机工程, 2026, 52(2): 265-274.
[10]	张黔会, 袁凌云, 谢天玉, 吴加英. 智能合约驱动的公平可验证秘密共享[J]. 计算机工程, 2025, 51(9): 177-191.
[11]	毕昌兵, 田有亮. 车联网中基于身份签名的匿名可追溯消息认证方案[J]. 计算机工程, 2025, 51(9): 158-165.
[12]	张伟航, 钟永彦, 向元柱, 丁士旵. 边云辅助下的可撤销属性加密方案[J]. 计算机工程, 2025, 51(7): 244-253.
[13]	雷一凡, 陈晓红. 隐私保护的去中心联邦多视图聚类[J]. 计算机工程, 2025, 51(7): 180-189.
[14]	姚玉鹏, 魏立斐, 张蕾. 一种隐私保护的抗投毒攻击联邦学习方案[J]. 计算机工程, 2025, 51(6): 223-235.
[15]	施永辉, 代琪, 陈丽芳, 韩阳. 基于自然最近邻的联邦聚合算法[J]. 计算机工程, 2025, 51(6): 236-244.

选择文件类型/文献管理软件名称

选择包含的内容