作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 32-42,51. doi: 10.19678/j.issn.1000-3428.0065549

• 热点与综述 • 上一篇    下一篇

一种高效的非交互式隐私保护逻辑回归模型

唐敏, 张宇浩, 邓国强   

  1. 桂林电子科技大学 数学与计算科学学院 广西高校数据分析与计算重点实验室, 广西 桂林 541004
  • 收稿日期:2022-08-19 修回日期:2022-09-30 发布日期:2022-11-03
  • 作者简介:唐敏(1980-),女,副教授、博士,主研方向为计算机代数、机器学习;张宇浩,硕士研究生;邓国强(通信作者),副教授、博士研究生。
  • 基金资助:
    广西科技基地和人才专项(AD18281024);桂林电子科技大学研究生教育创新计划项目(2022YCXS144)。

An Efficient Non-Interactive and Privacy-Preserving Logistic Regression Model

TANG Min, ZHANG Yuhao, DENG Guoqiang   

  1. Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation, School of Mathematics and Computing Science, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China
  • Received:2022-08-19 Revised:2022-09-30 Published:2022-11-03

摘要: 逻辑回归作为一种典型的机器学习算法,被广泛应用于医疗诊断、金融预测等领域。由于单个用户没有足够的样本构建高精度模型,传统的集中式训练则会导致隐私泄露,因此构建具有隐私保护的逻辑回归模型受到广泛关注。现有的要求用户和服务器之间进行交互的方案具有较高的计算成本和通信负担。提出一种高效的非交互式逻辑回归训练协议,利用具有良可分离结构的梯度更新公式,解耦样本数据和模型参数之间的计算耦合性,保证用户与服务器之间的单向单次传输性,即用户将本地数据整合并以秘密共享的方式上传给云服务器后即可离线。在训练阶段设计基于矩阵和向量运算的协议,保证服务器在每次迭代中使用固定的信息更新参数,降低计算成本和通信开销。同时,基于协议的安全性分析和数值实验,在UCI库的4个真实数据集上训练逻辑回归模型,实验结果表明,在保证模型精度的前提下,与最新的隐私保护逻辑回归方案VANE相比,该回归模型效率提升了80~120倍,且训练时间与明文域相近。

关键词: 逻辑回归, 隐私保护, 良可分离结构, 秘密共享, 向量化

Abstract: As a typical machine learning algorithm, logistic regression is widely used in medical diagnosis, financial forecasting and other fields. Since a single user does not have enough samples to build a high-precision model, and the traditional centralized training will lead to privacy leakage, building a logistic regression model with privacy preserving has attracted extensive attention. The existing schemes that require communication between users and servers lead to high computing costs and communication burden. This paper proposes an efficient non-interactive logistic regression training protocol. Using the gradient update formula with a well-separable structure, the computational coupling between sample data and model parameters is decoupled to ensure one-direction single transmission between users and servers. That is, users can go offline after integrating local data and uploading it to the cloud servers in a secret sharing manner;In the training phase, a protocol based on matrix and vector operation is designed to ensure that the server uses fixed information update parameters in each iteration, reducing the calculation cost and communication overhead. Meanwhile, the protocol security analysis and numerical experiments are provided. The experimental results of training the logistic regression model on four real datasets from the UCI library show that, under the premise of ensuring the accuracy of the model, the efficiency is greatly improved (80-120 times) compared with the latest privacy preserving logistic regression scheme VANE, and the training time is similar to that in the plaintext domain.

Key words: logistic regression, privacy-preserving, well-separable structure, secret sharing, vectorization

中图分类号: