Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Research on a Defense Method for Label Flipping Attacks Oriented to Federated Learning

  

  • Published:2026-06-22

一种面向联邦学习标签翻转攻击的防御方法研究

Abstract: As a distributed learning architecture, federated learning allows clients to perform global model training without sharing local data, which can effectively balance the contradiction between privacy and efficiency. However, its distributed characteristics also make it vulnerable to data poisoning attacks. Malicious clients tamper with local training data to inject biased or wrong updates into the global model, so as to achieve the purpose of reducing the accuracy of the model or manipulating the behavior of the model under specific inputs. Label inversion attack, as a classical method in data poisoning attack, is simple to implement and has low computational cost, and only requires modifying local data labels without modifying features. It is difficult to be found by conventional statistical analysis, but it can effectively reduce the accuracy of the global model or complete the backdoor implantation. In order to improve the accuracy of the global model and the overall security of the system in federated learning, the model update parameters uploaded by each local client are usually screened and filtered from the server before the global model aggregation, so as to accurately identify the malicious client behavior and perform robust aggregation to resist data poisoning attacks. To solve the above problems, this paper proposes a Label Flipping Attack Defense Algorithm (LFADA) oriented to federated learning, which aims to improve the accuracy and security of the model in the face of data poisoning. LFADA uses the log-likelihood score mechanism, and first flattens and reduces the dimension of the updated parameters of each client model to construct the sample set. Secondly, the Gaussian Mixture Model (GMM) is used to model the processed updated parameter sample set. Then, the update probability of each client is quantified by the Log-Likelihood Score (LLS), and the "normality" score of each client is obtained. Then, based on the current parameter set, the filtering threshold score is set according to the required quantile, and the clients below the score are considered as malicious clients. The update parameters of all malicious clients are eliminated, and only the update parameters of the clients that pass the filtering are aggregated, so as to realize the unsupervised anomaly detection and filtering of client updates and the secure aggregation of the global model. This paper conducts experiments on the MNIST dataset, Fashion-MNIST dataset and CIFAR-10 dataset, and uniformly sets a Convolutional Neural Network (CNN) containing three convolutional blocks as the basic model to carry out label flipping attacks respectively. The experiments of model accuracy and attack success rate show that LFADA can effectively resist label flipping attacks when the proportion of malicious clients is 0.1, 0.2, 0.3 and 0.5, and LFADA still performs well when the proportion of malicious clients is 0.5. Compared with nine mainstream algorithms such as Multi-Krum, Median, Foolsgold and Lfighter, the accuracy of the model using LFADA is increased by 3.28%, 3.38% and 2.62% on average, while the attack success rate is kept low as a whole. Among them, it is lower than 3% on MNIST and Fashion-MNIST datasets, and significantly lower than most methods on CIFAR-10 dataset, which can maintain similar performance to the federal average FedAvg scheme model in the environment without poisoning attack. In terms of algorithm stability, the overall process of federated learning using LFADA can maintain overall stability throughout the training phase, especially on the more complex Fashion-MNIST and CIFAR-10 datasets, there is no large fluctuation up and down, and the overall amplitude is controllable, which is obviously stable compared with other algorithms. Time overhead experiments show that compared with the comparison algorithms, LFADA's time overhead is significantly reduced under the premise of ensuring the same accuracy and attack success rate.

摘要: 联邦学习作为一种分布式学习架构,允许客户端在不共享本地数据的前提下进行全局模型训练,能够有效平衡隐私与效率的矛盾,但其分布式特性也使其易受数据投毒攻击。恶意客户端通过篡改本地训练数据,向全局模型注入有偏差或错误的更新,从而达到降低模型准确率或在特定输入下操控模型行为的目的,其中标签反转攻击作为数据投毒攻击中的经典方法,实现简单、计算成本低,只需修改本地数据标签而无需修改特征,难以被常规统计分析发现,却能够有效降低全局模型准确率或完成后门植入。为了提高联邦学习中全局模型准确度和系统整体安全性,常常在全局模型聚合前,从服务器端筛查过滤各本地客户端上传的模型更新参数,准确识别恶意客户端行为,并进行鲁棒性聚合来抵御数据投毒攻击。针对上述问题,本文提出一种面向联邦学习的标签翻转攻击防御方法(Label Flipping Attack Defense Algorithm, LFADA),旨在提升模型在面对数据投毒时的准确度与安全性。LFADA使用对数似然得分机制,首先对各客户端模型更新后的参数进行展平、降维,从而构建样本集。其次,使用高斯混合模型(Gaussian Mixture Model,GMM)对处理后的更新参数样本集进行建模。然后,通过对数似然得分(Log-Likelihood Score,LLS)对每个客户端的更新进行概率量化,得出每个客户端的“正常性”得分。接着,基于当前参数集合根据要求的分位数设置过滤阈值得分,认为低于该得分的客户端为恶意客户端,并剔除所有恶意客户端的更新参数,只对通过筛选的客户端更新参数进行聚合更新,从而实现对客户端更新的无监督异常检测与过滤和全局模型的安全聚合。本文分别在MNIST数据集、Fashion-MNIST数据集、CIFAR-10数据集上进行实验,统一设置包含三个卷积块的卷积神经网络(Convolutional Neural Network,CNN)作为基础模型,分别进行标签翻转攻击。模型准确率和攻击成功率的实验表明,在恶意客户端比例为0.1、0.2、0.3、0.5时,LFADA能够有效抵御标签翻转攻击,且在恶意客户端比例为0.5这种高比例恶意客户端时,LFADA表现依然较好。与Multi-Krum、Median、Foolsgold、Lfighter等9种主流算法相比,使用LFADA的模型准确度平均提高3.28%、3.38%和2.62%,同时攻击成功率整体保持较低比例,其中在MNIST、Fashion-MNIST数据集上均低于3%,在CIFAR-10数据集上也显著低于多数方法,能够与无投毒攻击环境下联邦平均FedAvg方案模型的性能保持相近。在算法稳定性方面,使用LFADA的联邦学习整体过程能够在整个训练阶段保持整体稳定,尤其在较为复杂的Fashion-MNIST和CIFAR-10数据集上,未出现大幅度上下波动,整体幅度可控,较其他算法明显稳定。时间开销实验表明,与对比算法相比,在保证相同准确度和攻击成功率的前提下,LFADA的时间开销显著降低。