Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Research on large language model machine translation method based on local preference optimization

  

  • Published:2025-09-18

基于局部偏好优化的大模型机器翻译方法研究

Abstract: Reinforcement learning methods based on direct preference optimization have shown excellent results in many downstream tasks of large language models. However, when applied directly to machine translation, this approach often leads to over-optimization problems due to the global reward maximization strategy. Specifically, it causes the model to overly focus on consistency with the distribution of reference translations, thereby losing the potential for local translation diversity and global optimization. To address the aforementioned issues, the problem of performance degradation of direct preference optimization methods in large language model machine translation was investigated. Based on this, a large language model machine translation method based on local preference optimization was proposed. This method identifies frequently mistranslated low-frequency phrases in translations through dynamic temperature sampling and reference-free evaluation of the large language model. Furthermore, a preference data construction method that combines global differences and local key differences is introduced. Considering both the overall translation quality of the model and the local translation diversity, global loss and local loss functions at the token level are proposed. Finally, a two-phase curriculum learning strategy is employed to gradually adjust the model's output preference for low-frequency phrases. The proposed method was validated on the FLORES-200 dataset, selecting fourteen multilingual translation tasks with complex morphologies for testing. The experimental results showed that the scores of the proposed method on XCOMET, COMET-22, and BLEU were 80.7, 89.9, and 30.2, respectively. By comparing with several strong baselines in multilingual machine translation, the proposed method outperformed the baseline models across all translation directions, confirming the effectiveness of the method.

摘要: 基于直接偏好优化的强化学习方法在大模型诸多下游任务中展现了良好的效果,然而该方法直接应用在机器翻译中常常会因为全局奖励最大化策略而会产生过度优化问题,具体表现为模型过度关注与参考译文的分布一致性,而丧失了局部翻译多样性和全局优化的潜力。为了解决上述问题,探究了直接偏好优化方法在大模型机器翻译中表现劣化的根本原因,在此基础上提出了一种基于局部偏好优化的大模型机器翻译方法。该方法通过对大模型的动态温度采样和无参考评估找出翻译中的易错低频短语,然后提出了一种结合全局差异和局部关键差异的偏好数据构造方法,在综合考虑模型全局翻译效果和局部多样性的前提下提出了token级的全局损失和局部损失函数,最后利用两阶段课程学习的策略逐步调整模型对低频短语的输出偏好。提出的方法在FLORES-200数据集上进行验证,选取了十四种形态复杂的多语言翻译任务进行测试,实验结果表明,所提方法在XCOMET、COMET-22和BLEU的得分结果分别为80.7、89.9和30.2。通过与多个多语言机器翻译强基线进行对比,所提方法在所有翻译语向上均优于基线模型,验证了方法的有效性。