作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 153-164. doi: 10.19678/j.issn.1000-3428.0068223

• 网络空间安全 • 上一篇    下一篇

基于模型相似度与本地损失的双重客户端选择算法

李红娇, 王宝金*(), 王朝晖, 胡仁豪   

  1. 上海电力大学计算机科学与技术学院, 上海 201306
  • 收稿日期:2023-08-11 出版日期:2024-08-15 发布日期:2023-12-29
  • 通讯作者: 王宝金
  • 基金资助:
    国家自然科学基金(61702321)

Dual-Client Selection Algorithm Based on Model Similarity and Local Loss

Hongjiao LI, Baojin WANG*(), Zhaohui WANG, Renhao HU   

  1. College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China
  • Received:2023-08-11 Online:2024-08-15 Published:2023-12-29
  • Contact: Baojin WANG

摘要:

联邦学习是一种分布式机器学习技术, 通过聚合客户端本地模型参数共建全局模型。现有的联邦学习客户端选择算法作用于训练前或者训练后。面对统计异质的客户端数据, 训练前选择算法会使一些性能较差的客户端参与聚合, 导致模型的准确率下降。而训练后选择算法要求所有客户端参与训练, 需要大量的通信开销。为此, 提出双重客户端选择(DCS)算法, 在训练前选择1个客户端训练子集, 以减少全局模型的下发, 在子集训练后选择部分客户端参与聚合, 以减少本地模型的上传。在本地训练前, 服务器根据本地与全局模型的余弦相似度进行层次聚类, 得到不同的选择概率分布, 从中选出无偏的训练子集, 以便更好地适应客户端数据的统计异质性。在子集训练后, 服务器不仅考虑了本地损失, 还结合了本地与全局模型的余弦相似度筛选出聚合子集, 提高全局模型准确率。在Fashion-MNIST和CIFAR-10数据集上的实验结果表明, DCS算法相比于基线算法的测试准确率最大可提升8.55个百分点, 同时上行和下行链路的通信开销分别为O(mn+2d)和O(dn+m)。

关键词: 联邦学习, 客户端选择, 模拟相似度, 聚类, 本地损失

Abstract:

Federated learning is a distributed machine-learning technique that collaboratively constructs a global model by aggregating local model parameters from clients. Existing client selection algorithms for federated learning perform only pre- or post-training. With statistically heterogeneous client data, pre-training selection algorithms may involve poorly performing clients in aggregation, leading to a reduction in model accuracy. However, post-training selection algorithms require that all clients participate in training, which results in significant communication overhead. To address these issues, this study proposes a Dual-Client Selection (DCS) algorithm. This algorithm first selects a subset of clients for training prior to the local training phase to reduce the download of global models. Following the subset training, some clients are chosen to participate in aggregation to reduce the upload of local models. Prior to local training, the server conducts hierarchical clustering based on the cosine similarity between the local and global models. This process yields different selection probability distributions from which an unbiased training subset is selected to better adapt to the statistical heterogeneity of the client data. Following subset training, the server considers not only the local loss but also the cosine similarity between the local and global models. This enables the aggregated subset to be chosen, thereby improving the accuracy of the global model. Experimental results on the Fashion-MNIST and CIFAR-10 datasets demonstrate that the proposed DCS algorithm improves the test accuracy by a maximum of 8.55 percentage points as compared with the baseline algorithm, where the communication overheads of the uplink and downlink are O(mn+2d) and O(dn+m), respectively.

Key words: federated learning, client selection, model similarity, clustering, local loss