Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Differential Low-Rank Adaptation-based Sensitive Information Protection for Large Language Model Training

  

  • Published:2025-10-28

基于差分低秩适配的大模型训练敏感信息保护方法研究

Abstract: As generative AI technologies become increasingly integrated into sensitive industries, the over-reliance of large generative models on memorizing training data during fine-tuning poses a growing risk of privacy leakage, where user identities, behavioral traces, and other sensitive information may be reconstructed during inference. To address this issue, a novel fine-tuning approach combining Differential Privacy (DP) with Low-Rank Adaptation (LoRA) is proposed. This method freezes the parameters of the pre-trained model and updates only the inserted LoRA modules. Additionally, Differential Privacy Stochastic Gradient Descent (DP-SGD) is introduced, implementing gradient norm clipping and Gaussian noise injection on a per-sample basis to minimize the model’s dependence on individual training samples. Based on the Qwen2-1.5B language model, a task-specific fine-tuning dataset incorporating user profiles is constructed, and adversarial samples targeting typical sensitive fields—such as identity markers, behavioral characteristics, and location data—are developed to evaluate the anti-leakage capabilities of traditional full-parameter fine-tuning versus the DP-LoRA approach. Experimental results demonstrate that fully fine-tuned models exhibit a high sensitive-information match rate of 73.07% across 130 adversarial samples, indicating severe privacy vulnerabilities. In contrast, the DP-LoRA fine-tuned models achieve a significantly reduced match rate of only 1.5%, with generated content showing minimal correlation to original training data. This approach effectively mitigates the risk of sensitive information disclosure, offering a cost-efficient and highly adaptable training strategy for deploying generative models in real-world scenarios with stringent data security requirements.

摘要: 随着生成式人工智能技术在敏感行业的深入应用,生成式大模型在微调阶段对训练数据的过度“记忆”问题日益严重,易导致用户身份、行为轨迹等敏感信息在推理阶段被重现,造成隐私泄露风险。针对这一问题,提出一种融合差分隐私(Differential Privacy, DP)与低秩适配(Low-Rank Adaptation, LoRA)的微调训练方法,在冻结预训练模型主体参数的基础上,仅对插入的LoRA模块进行梯度更新,并在训练过程中引入差分隐私随机梯度下降(DP-SGD)机制,对单样本梯度实施范数裁剪与高斯噪声注入,以控制模型对个体样本的敏感依赖。该方法以Qwen2-1.5B语言模型为基础,构建包含用户画像的指令式微调数据集,并设计针对身份标识、行为特征、位置信息等典型敏感字段的攻击样本,对比传统全参数微调与DP-LoRA方法在防泄露能力上的差异。实验结果表明,全参数微调模型在130条攻击样本中敏感信息匹配率高达73.07%,显示出严重的隐私风险;而DP-LoRA微调模型的匹配率显著降低至1.5%,整体生成内容与原始数据相关性极低。该方法有效降低了敏感信息重现概率,为生成式模型在数据安全要求较高的实际场景中提供了一种低成本、高适应性的训练策略。