An Efficient Non-Interactive and Privacy-Preserving Logistic Regression Model

doi:10.19678/j.issn.1000-3428.0065549

Abstract

Abstract: As a typical machine learning algorithm, logistic regression is widely used in medical diagnosis, financial forecasting and other fields. Since a single user does not have enough samples to build a high-precision model, and the traditional centralized training will lead to privacy leakage, building a logistic regression model with privacy preserving has attracted extensive attention. The existing schemes that require communication between users and servers lead to high computing costs and communication burden. This paper proposes an efficient non-interactive logistic regression training protocol. Using the gradient update formula with a well-separable structure, the computational coupling between sample data and model parameters is decoupled to ensure one-direction single transmission between users and servers. That is, users can go offline after integrating local data and uploading it to the cloud servers in a secret sharing manner;In the training phase, a protocol based on matrix and vector operation is designed to ensure that the server uses fixed information update parameters in each iteration, reducing the calculation cost and communication overhead. Meanwhile, the protocol security analysis and numerical experiments are provided. The experimental results of training the logistic regression model on four real datasets from the UCI library show that, under the premise of ensuring the accuracy of the model, the efficiency is greatly improved (80-120 times) compared with the latest privacy preserving logistic regression scheme VANE, and the training time is similar to that in the plaintext domain.

Key words: logistic regression, privacy-preserving, well-separable structure, secret sharing, vectorization

摘要： 逻辑回归作为一种典型的机器学习算法，被广泛应用于医疗诊断、金融预测等领域。由于单个用户没有足够的样本构建高精度模型，传统的集中式训练则会导致隐私泄露，因此构建具有隐私保护的逻辑回归模型受到广泛关注。现有的要求用户和服务器之间进行交互的方案具有较高的计算成本和通信负担。提出一种高效的非交互式逻辑回归训练协议，利用具有良可分离结构的梯度更新公式，解耦样本数据和模型参数之间的计算耦合性，保证用户与服务器之间的单向单次传输性，即用户将本地数据整合并以秘密共享的方式上传给云服务器后即可离线。在训练阶段设计基于矩阵和向量运算的协议，保证服务器在每次迭代中使用固定的信息更新参数，降低计算成本和通信开销。同时，基于协议的安全性分析和数值实验，在UCI库的4个真实数据集上训练逻辑回归模型，实验结果表明，在保证模型精度的前提下，与最新的隐私保护逻辑回归方案VANE相比，该回归模型效率提升了80~120倍，且训练时间与明文域相近。

关键词: 逻辑回归, 隐私保护, 良可分离结构, 秘密共享, 向量化

CLC Number:

TP309

TANG Min, ZHANG Yuhao, DENG Guoqiang. An Efficient Non-Interactive and Privacy-Preserving Logistic Regression Model[J]. Computer Engineering, 2023, 49(4): 32-42,51.

唐敏, 张宇浩, 邓国强. 一种高效的非交互式隐私保护逻辑回归模型[J]. 计算机工程, 2023, 49(4): 32-42,51.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0065549

http://www.ecice06.com/EN/Y2023/V49/I4/32

Figures/Tables 11

References

[1] PALLONETTO F, DE ROSA M, MILANO F, et al.Demand response algorithms for smart-grid ready residential buildings using machine learning models[J].Applied Energy, 2019, 239:1265-1282.
[2] WANG W, ZHENG H S, WU Y J.Prediction of fundraising outcomes for crowdfunding projects based on deep learning:a multimodel comparative study[J].Soft Computing, 2020, 24(11):8323-8341.
[3] REHOUMA R, BUCHERT M, CHEN Y P.Machine learning for medical imaging-based COVID-19 detection and diagnosis[J].International Journal of Intelligent Systems, 2021, 36(9):5085-5115.
[4] HOOSHMAND A.Accurate diagnosis of prostate cancer using logistic regression[J].Open Medicine, 2021, 16(1):459-463.
[5] CHOWANDA A, SUTOYO R, MEILIAN A, et al.Exploring text-based emotions recognition machine learning techniques on social media conversation[J].Procedia Computer Science, 2021, 179(1):821-828.
[6] CVITIC I, PERAKOVIC D, PERISA M, et al.Ensemble machine learning approach for classification of IoT devices in smart home[J].International Journal of Machine Learning and Cybernetics, 2021, 12(11):3179-3202.
[7] HOU R, KONG Y Q, CAI B, et al.Unstructured big data analysis algorithm and simulation of Internet of things based on machine learning[J].Neural Computing and Applications, 2020, 32(10):5399-5407.
[8] GUO W, SHAO J, LU R X, et al.A privacy-preserving online medical prediagnosis scheme for cloud environment[J].IEEE Access, 2018, 6:48946-48957.
[9] FAN Y K, BAI J R, LEI X, et al.Privacy preserving based logistic regression on big data[J].Journal of Network and Computer Applications, 2020, 171:102769.
[10] CHEN H, GILAD-BACHRACH R, HAN K, et al.logistic regression over encrypted data from fully homomorphic encryption[J].BMC Medical Genomics, 2018, 11(4):3-12.
[11] 许心炜, 蔡斌, 向宏, 等.基于同态加密的多分类逻辑回归模型[J].密码学报, 2020, 7(2):179-186. XU X W, CAI B, XIANG H, et al.Multinomial logistic regression model based on homomorphic encryption[J].Journal of Cryptologic Research, 2020, 7(2):179-186.(in Chinese)
[12] 宋蕾, 马春光, 段广晗, 等.基于数据纵向分布的隐私保护逻辑回归[J].计算机研究与发展, 2019, 56(10):2243-2249. SONG L, MA C G, DUAN G H, et al.Privacy-preserving logistic regression on vertically partitioned data[J].Journal of Computer Research and Development, 2019, 56(10):2243-2249.(in Chinese)
[13] 郭娟娟, 王琼霄, 许新, 等.安全多方计算及其在机器学习中的应用[J].计算机研究与发展, 2021, 58(10):2163-2186. GUO J J, WANG Q X, XU X, et al.Secure multiparty computation and application in machine learning[J].Journal of Computer Research and Development, 2021, 58(10):2163-2186.(in Chinese)
[14] GUYON I, LI J W, MADER T, et al.Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark[J].Pattern Recognition Letters, 2007, 28(12):1438-1444.
[15] MOHASSEL P, ZHANG Y P.SecureML:a system for scalable privacy-preserving machine learning[C]//Proceedings of IEEE Symposium on Security and Privacy.Washington D.C., USA.IEEE Press, 2017:19-38.
[16] MARTINE D C, DOWSLEY R, NASCIMENTO A C A, et al.High performance logistic regression for privacy-preserving genome analysis[J].BMC Medical Genomics, 2021, 14(1):1-18.
[17] 郑云涛, 叶家炜.基于OT协议的FATE联邦迁移学习方案[J].计算机工程, 2023, 49(2):24-30. ZHENG Y T, YE J W.FATE federated transfer learning scheme based on OT protocol[J].Computer Engineering, 2023, 49(2):24-30.(in Chinese)
[18] LI T, LI J, CHEN X F, et al.NPMML:A framework for non-interactive privacy-preserving multi-party machine learning[J].IEEE Transactions on Dependable and Secure Computing, 2021, 18(6):2969-2982.
[19] MA X, CHEN X F, ZHANG X Y.Non-interactive privacy-preserving neural network prediction[J].Information Sciences, 2019, 481:507-519.
[20] WANG F W, ZHU H, LU R X, et al.A privacy-preserving and non-interactive federated learning scheme for regression training with gradient descent[J].Information Sciences, 2021, 552:183-200.
[21] PREGIBON D.logistic regression diagnostics[J].The Annals of Statistics, 1981, 9(4):705-724.
[22] HARDY S, HENECKA W, IVEY-LAW H, et al.Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[EB/OL].[2022-07-10].https://arxiv.org/abs/1711.10677.
[23] BEAVER D.Commodity-based cryptography[C]//Proceedings of the 29th Annual ACM Symposium on Theory of Computing.New York, USA:ACM Press, 1997:446-455.
[24] KINCAID D, CHENEY W.Numerical analysis-mathematics of scientific computing[J].Mathematics of Computation, 1992, 59(199):297.
[25] BENNETT P H, BURCH T A, MILLER M.Diabetesmellitus in American(Pima) Indians[J].The Lancet, 1971, 298(7716):125-128.
[26] MANGASARIAN O L, SETIONO R, WOLBERG WH.Pattern recognition via linear programming:theory and application to medical diagnosis[EB/OL].[2022-07-10].http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29.
[27] DETRANO R, JANOSI A, STEINBRUNN W, et al.International application of a new probability algorithm for the diagnosis of coronary artery disease[J].The American Journal of Cardiology, 1989, 64(5):304-310.
[28] Quinlan J R.Simplifying decision trees[J].International Journal of Man-machine Studies, 1987, 27(3):221-234.
[29] JIANG Y C, HAMER J, WANG C H, et al.SecureLR:secure logistic regression model via a hybrid cryptographic protocol[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019, 16(1):113-123.

Please choose a citation manager

Content to export