作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 20-28. doi: 10.19678/j.issn.1000-3428.0066689

• 热点与综述 • 上一篇    下一篇

基于互信息软聚类的个性化联邦学习算法

郑美光, 杨泳   

  1. 中南大学 计算机学院, 长沙 410083
  • 收稿日期:2023-01-04 出版日期:2023-08-15 发布日期:2023-08-15
  • 作者简介:

    郑美光(1983—),女,副教授、博士,主研方向为云计算、大数据

    杨泳,硕士研究生

  • 基金资助:
    国家自然科学基金(62172442); 国家自然科学基金(62172451); 湖南省自然科学基金青年科学基金项目(2020JJ5775)

Personalized Federated Learning Algorithm Based on Mutual Information and Soft Clustering

Meiguang ZHENG, Yong YANG   

  1. School of Computer Science and Engineering, Central South University, Changsha 410083, China
  • Received:2023-01-04 Online:2023-08-15 Published:2023-08-15

摘要:

联邦学习是一种为多个客户协作训练机器学习模型的分布式机器学习技术,同时能够保护客户数据隐私,但客户数据异构性限制了联邦学习的应用,对此,个性化联邦学习是一种可行的解决方案。传统基于聚类的个性化联邦学习方案将具有相同数据分布的客户划分为一个集群,利用部分客户数据同构的特点减少了数据异构对联邦学习的影响,但忽略了客户属于多个集群的可能性。基于客户数据近似服从多种数据分布的思想,提出基于互信息软聚类的个性化联邦学习算法(pFedMS)。针对目前联邦学习客户聚类指标无法准确反映模型特征相似性的不足,给出基于模型特征的互信息公式作为聚类指标,有效区分相似客户;提出基于类内距离和类间距离的聚类合理性衡量方法,用于动态调整聚类结果;根据隶属度计算客户与集群的相似性,允许客户同时属于多个集群,提高聚类算法的性能。在CIFAR-10和FMNIST数据集上的实验结果表明,pFedMS算法相较于FedAvg、CFL等对比算法,客户最高平均测试准确率提高了2.4~3.0个百分点。

关键词: 个性化联邦学习, 数据偏差, 软聚类, 模型特征, 互信息

Abstract:

Federated learning is a distributed machine learning technique for collaboratively training machine learning models for multiple clients while protecting the privacy of client data. However, the heterogeneity inherent in client data limits the full application potential of federated learning, for which personalized federated learning is a viable solution. The traditional clustering-based personalized federated learning schemes group clients with the same data distribution into one cluster, exploiting the homogeneous nature of some client data and reducing the impact of data heterogeneity on federated learning; however, this approach fails to account for the possibility of clients belonging to multiple clusters. Based on the concept that client data approximate adhere to multiple data distributions, a personalized Federated learning algorithm is proposed based on Mutual information and Soft clustering(pFedMS).A mutual information formula based on model features is introduced to address the shortcomings of current federated learning client clustering indices, which can not accurately reflect the similarity of model features.This formula serves as a clustering index that effectively distinguishes similar clients. A clustering rationality measurement method based on intra-class and inter-class distances is proposed to dynamically adjust the clustering results. The similarity between clients and clusters is calculated using affiliation, which allows clients to belong to multiple clusters simultaneously and improves the performance of the clustering algorithm. Experimental results on CIFAR-10 and Fashion-MNIST(FMNIST) datasets show that the pFedMS improves the Best Mean Testing Accuracy(BMTA) of clients by 2.4 to 3.0 percentage points compared to the comparison algorithms such as FedAvg, CFL.

Key words: personalized federated learning, data bias, soft clustering, model feature, mutual information