作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于联邦学习的多技术融合数据交易方法

  • 发布日期:2023-11-14

A multi-technology fusion data trading method based on federated learning

  • Published:2023-11-14

摘要: 数据保护的约束使得数据被限制在不同企业和组织之间,形成了众多“数据孤岛”,难以发挥其蕴含的重要价值。联邦学习的出现使得数据在组织之间共享成为可能,但利益分配方案不明确、通信成本高、中心化等问题使其难以满足数据交易场景的多方位需求。针对这些问题,提出了一种基于联邦学习的多技术融合数据交易方法(MTFDT)。该方法中通过结合可信执行环境与沙普利值进行了激励机制设计,并对交易过程中模型数据同步机制进行了优化,提出了一种基于树型拓扑结构的模型同步方案,使得同步时间复杂度由线性级降低至对数级。同时,设计了基于区块链的利益分配数据和模型数据存储方案,使得交易过程信息不可篡改并能够通过溯源的方式进行追责。最后,基于公开数据集进行了仿真对比,实验结果表明,MTFDT能够实现模型训练效果的精确评估,提高利益分配的公平性。相比已有方案,模型同步时间消耗最多减少了34%且对带宽要求更低,进一步验证了所提出方案在数据交易场景中应用的有效性。

Abstract: The constraints of data protection have restricted data within different enterprises and organizations, forming many "data islands" that make it difficult to tap into their inherent important value. The emergence of federated learning (FL)has made data sharing between organizations possible, but issues like unclear benefit distribution schemes, high communication costs, centralization, etc. make it difficult to meet the multi-faceted demands of data trading scenarios. To address these issues, a multi-technology fused data trading method (MTFDT) based on federated learning is proposed. In this method, the incentive mechanism is designed by combining trusted execution environments with Shapley Value, and the model and data synchronization mechanism during trading is optimized with a tree-based topological structure-based model synchronization scheme, reducing the synchronization time complexity from linear to logarithmic. At the same time, a blockchain-based benefit distribution data and model data storage solution is designed, making the transaction information tamper-proof and accountable through traceability. Finally, simulations and comparisons were conducted based on public datasets. The experimental results show that MTFDT can achieve precise evaluation of model training effects and improve the fairness of benefit distribution. Compared with existing solutions, the time consumption of model synchronization is reduced by up to 34%, and the bandwidth requirement is lower. This further verifies the effectiveness of the proposed scheme in data trading scenarios.