作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (1): 22-30. doi: 10.19678/j.issn.1000-3428.0064301

• 热点与综述 • 上一篇    下一篇

基于超级账本的集群联邦优化模型

李尤慧子1, 俞海涛1, 殷昱煜1, 高洪皓2,3   

  1. 1. 杭州电子科技大学 计算机学院, 杭州 310018;
    2. 上海大学 计算机工程与科学学院, 上海 200444;
    3. 韩国嘉泉大学 计算机工程系, 城南 461701
  • 收稿日期:2022-03-25 修回日期:2022-10-14 发布日期:2022-11-16
  • 作者简介:李尤慧子(1989-),女,副教授、博士,主研方向为边缘计算、隐私保护;俞海涛,硕士研究生;殷昱煜、高洪皓,教授、博士。
  • 基金资助:
    浙江省“尖兵”“领雁”研发攻关计划“基于异质多模态数据融合的卒中后认知功能障碍智能诊疗技术研究与应用示范”(2022C03043);浙江省自然科学基金“车联网中全路径轨迹隐私保护关键技术研究”(LY22F020018)。

Cluster Federated Optimization Model Based on Hyperledger Fabric

LI Youhuizi1, YU Haitao1, YIN Yuyu1, GAO Honghao2,3   

  1. 1. School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China;
    2. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China;
    3. Department of Computer Engineering, Gachon University, Seongnam 461701, Republic of Korea
  • Received:2022-03-25 Revised:2022-10-14 Published:2022-11-16

摘要: 联邦学习作为分布式机器学习框架,在数据不离开本地的情况下,通过共享模型参数达到协作训练的目标,一定程度上解决了隐私保护问题,但其存在中心参数服务器无法应对单点故障、潜在恶意客户端梯度攻击、客户端数据偏态分布导致训练性能低下等问题。将去中心化的区块链技术与联邦学习相结合,提出基于超级账本的集群联邦优化模型。以超级账本作为分布式训练的架构基础,客户端初始化后在本地训练向超级账本传输模型参数及分布信息,通过聚类优化联邦学习模型在客户端数据非独立同分布下的训练表现。在此基础上,随机选举客户端成为领导者,由领导者代替中央服务器的功能,领导者根据分布相似度和余弦相似度聚类并下载模型参数聚合,最后客户端获取聚合模型继续迭代训练。以EMNIST数据集为例,数据非独立同分布情况下该模型平均准确率为79.26%,较FedAvg提高17.26%,在保证准确率的前提下,较集群联邦学习训练至收敛的通信轮次减少36.3%。

关键词: 区块链, 超级账本, 联邦学习, 隐私保护, 智能合约

Abstract: As a distributed machine learning framework, Federated Learning(FL) achieves collaborative training by sharing model parameters without leaving the data locally and ensures privacy protection to a certain extent. However, several challenges are encountered in FL, such as potential malicious client gradient attacks, inability of the central parameter server to cope with a single point of failure, and poor training performance due to skewed client data distribution. In response to the above-mentioned problems, the decentralized blockchain technology is combined with FL and a cluster federated optimization model is proposed based on Hyperledger Fabric. The model uses Hyperledger Fabric as the architectural basis for distributed training. After the client is initialized, the local training transmits model parameters and distribution information to the Hyperledger Fabric. The training performance of FL under the Non-IID distribution of client data is optimized through clustering; then, a client is randomly elected to become the leader; the leader replaces the central server. The leader clusters and downloads the model parameter aggregation according to the distribution similarity and cosine similarity. The client obtains the aggregation model and iterative training continues. On the EMNIST dataset, the average accuracy of the proposed model is 79.26% under Non-IID distribution of data, which is 17.26% higher than that of FedAvg. For a high rate, it requires 36.3% less time than the communication round of Cluster Federated Learning(CFL) training to convergence.

Key words: blockchain, Hyperledger Fabric, Federated Learning(FL), privacy protection, smart contract

中图分类号: