Research on large language model machine translation method based on local preference optimization

doi:10.19678/j.issn.1000-3428.0252213

Abstract

Abstract: Reinforcement learning methods based on direct preference optimization have shown excellent results in many downstream tasks of large language models. However, when applied directly to machine translation, this approach often leads to over-optimization problems due to the global reward maximization strategy. Specifically, it causes the model to overly focus on consistency with the distribution of reference translations, thereby losing the potential for local translation diversity and global optimization. To address the aforementioned issues, the problem of performance degradation of direct preference optimization methods in large language model machine translation was investigated. Based on this, a large language model machine translation method based on local preference optimization was proposed. This method identifies frequently mistranslated low-frequency phrases in translations through dynamic temperature sampling and reference-free evaluation of the large language model. Furthermore, a preference data construction method that combines global differences and local key differences is introduced. Considering both the overall translation quality of the model and the local translation diversity, global loss and local loss functions at the token level are proposed. Finally, a two-phase curriculum learning strategy is employed to gradually adjust the model's output preference for low-frequency phrases. The proposed method was validated on the FLORES-200 dataset, selecting fourteen multilingual translation tasks with complex morphologies for testing. The experimental results showed that the scores of the proposed method on XCOMET, COMET-22, and BLEU were 80.7, 89.9, and 30.2, respectively. By comparing with several strong baselines in multilingual machine translation, the proposed method outperformed the baseline models across all translation directions, confirming the effectiveness of the method.

摘要： 基于直接偏好优化的强化学习方法在大模型诸多下游任务中展现了良好的效果，然而该方法直接应用在机器翻译中常常会因为全局奖励最大化策略而会产生过度优化问题，具体表现为模型过度关注与参考译文的分布一致性，而丧失了局部翻译多样性和全局优化的潜力。为了解决上述问题，探究了直接偏好优化方法在大模型机器翻译中表现劣化的根本原因，在此基础上提出了一种基于局部偏好优化的大模型机器翻译方法。该方法通过对大模型的动态温度采样和无参考评估找出翻译中的易错低频短语，然后提出了一种结合全局差异和局部关键差异的偏好数据构造方法，在综合考虑模型全局翻译效果和局部多样性的前提下提出了token级的全局损失和局部损失函数，最后利用两阶段课程学习的策略逐步调整模型对低频短语的输出偏好。提出的方法在FLORES-200数据集上进行验证，选取了十四种形态复杂的多语言翻译任务进行测试，实验结果表明，所提方法在XCOMET、COMET-22和BLEU的得分结果分别为80.7、89.9和30.2。通过与多个多语言机器翻译强基线进行对比，所提方法在所有翻译语向上均优于基线模型，验证了方法的有效性。

LU Kaiwen, YANG Yating, DONG Rui, MA Bo, WANG Lei, ZHOU Xi, MA Rong. Research on large language model machine translation method based on local preference optimization[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252213.

鲁凯文, 杨雅婷, 董瑞, 马博, 王磊, 周喜, 马荣. 基于局部偏好优化的大模型机器翻译方法研究[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252213.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252213

References

[1] Touvron H, Lavril T, Izacard G, et al. LLaMA: open and efficient foundation language models. arXiv[J]. arXiv preprint arXiv:2302.13971, 2023.
[2] Anil R, Dai A M, Firat O, et al. Palm 2 technical report[J]. arXiv preprint arXiv:2305.10403, 2023.
[3] Chowdhery A, Narang S, Devlin J, et al. Palm: Scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240): 1-113.
[4] Fujii K, Nakamura T, Loem M, et al. Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities[J]. arXiv preprint arXiv:2404.17790, 2024.
[5] Zhang S, Fang Q, Zhang Z, et al. BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models[J]. arXiv pre-print server, 2023.
[6] Lu K, Yang Y, Yang F, et al. Low-Resource Language Expansion and Translation Capacity Enhancement for LLM: A Study on the Uyghur[C]. Proceedings of the 31st International Conference on Computational Linguistics, 2025: 8360-8373.
[7] Guo D, Yang D, Zhang H, et al. Deepseek-r1: Incentivizing reasoning capability in LLM via reinforcement learning[J]. arXiv preprint arXiv:2501.12948, 2025.
[8] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.
[9] Rafailov R, Sharma A, Mitchell E, et al. Direct preference optimization: Your language model is secretly a reward model[J]. Advances in Neural Information Processing Systems, 2024, 36.
[10] Wang P, Xu A, Zhou Y, et al. Direct Judgement Preference Optimization[J]. arXiv preprint arXiv:2409.14664, 2024.
[11] Chen G, Liao M, Li C, et al. Step-level Value Preference Optimization for Mathematical Reasoning[J]. arXiv preprint arXiv:2406.10858, 2024.
[12] Xu H, Sharaf A, Chen Y, et al. Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation[C]. Forty-first International Conference on Machine Learning.
[13] He Z, Wang X, Jiao W, et al. Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model[C]. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024: 8157-8173.
[14] Costa-Jussà M R, Cross J, Çelebi O, et al. No language left behind: Scaling human-centered machine translation[J]. arXiv preprint arXiv:2207.04672, 2022.
[15] Mao Z, Yu Y. Tuning LLM with contrastive alignment instructions for machine translation in unseen, low-resource languages[J]. arXiv preprint arXiv:2401.05811, 2024.
[16] Zhang X, Rajabi N, Duh K, et al. Machine translation with large language models: Prompting, few-shot learning, and fine-tuning with QLoRA[C]. Proceedings of the Eighth Conference on Machine Translation, 2023: 468-481.
[17] 侯钰涛, 阿布都克力木•阿布力孜, 史亚庆, et al. 面向"一带一路"的低资源语言机器翻译研究[J]. 计算机工程, 2024, 50(4): 332-341. Hou Y, Abudukelimu A, Shi Y, Research on low-resource language machine translation for the "Belt an Road" [J]. Computer Engineering, 2024, 50(4): 332-341.
[18] Yin Y, Zeng J, Li Y, et al. LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation[J]. arXiv preprint arXiv:2406.01441, 2024.
[19] Feng Z, Zhang Y, Li H, et al. Improving LLM-based machine translation with systematic self-correction[J]. arXiv preprint arXiv:2402.16379, 2024.
[20] Raunak V, Sharaf A, Wang Y, et al. Leveraging GPT-4 for Automatic Translation Post-Editing[C]. Findings of the Association for Computational Linguistics: EMNLP 2023, 2023: 12009-12024.
[21] 李博, 季佰军, 段湘煜. 基于译文易错词纠正机制的大语言模型机器翻译[J]. 计算机工程: 0. Li B, Ji B, Duan X, Machine Translation with Large Language Models Based on the Correction Mechanism of the Translations Error-Prone Words [J]. Computer Engineering, 0.
[22] Ma Y J, Liang W, Wang G, et al. Eureka: Human-Level Reward Design via Coding Large Language Models[C]. 2nd Workshop on Language and Robot Learning: Language as Grounding, 2023.
[23] Wu Y, Fan Y, Liang P P, et al. Read and reap the rewards: Learning to play atari with the help of instruction manuals[J]. Advances in Neural Information Processing Systems, 2023, 36: 1009-1023.
[24] Zhou Z, Hu B, Zhao C, et al. Large language model as a policy teacher for training reinforcement learning agents[C]. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024: 5671-5679.
[25] 罗焕坤, 葛一烽, 刘帅. 大语言模型在数学推理中的研究进展[J]. 计算机工程: 0. Luo H, Ge Y, Liu S, Research Progress of Large Language Models in Mathematical Reasoning [J]. Computer Engineering, 0.
[26] Yang G, Chen J, Lin W, et al. Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding[C]. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), 2024: 391-398.
[27] Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[J]. Advances in neural information processing systems, 2022, 35: 27730-27744.
[28] Achiam J, Adler S, Agarwal S, et al. Gpt-4 technical report[J]. arXiv preprint arXiv:2303.08774, 2023.
[29] Xu H, Murray K, Koehn P, et al. X-alma: Plug & play modules and adaptive rejection for quality translation at scale[J]. arXiv preprint arXiv:2410.03115, 2024.
[30] Post M. A Call for Clarity in Reporting BLEU Scores[J]. WMT 2018, 2018: 186.
[31] Rei R, Stewart C, Farinha A C, et al. COMET: A Neural Framework for MT Evaluation[C], 2020: 2685-2702.
[32] Guerreiro N M, Rei R, Stigt D V, et al. xcomet: Transparent machine translation evaluation through fine-grained error detection[J]. Transactions of the Association for Computational Linguistics, 2024, 12: 979-995.
[33] Barrault L, Bojar O, Costa-Jussa M R, et al. Findings of the 2019 conference on machine translation (WMT19)[C], 2019.
[34] Federmann C, Kocmi T, Xin Y. NTREX-128–news test references for MT evaluation of 128 languages[C]. Proceedings of the first workshop on scaling up multilingual evaluation, 2022: 21-24.
[35] Üstün A, Aryabumi V, Yong Z, et al. Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024: 15894-15939.
[36] Xue L, Constant N, Roberts A, et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer[C]. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 483-498.
[37] Lu Y, Zhu W, Li L, et al. LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages[C]. Findings of the Association for Computational Linguistics: EMNLP 2024, 2024: 10748-10772.
[38] Touvron H, Martin L, Stone K, et al. Llama 2: Open foundation and fine-tuned chat models[J]. arXiv preprint arXiv:2307.09288, 2023.
[39] Dubey A, Jauhri A, Pandey A, et al. The llama 3 herd of models[J]. arXiv preprint arXiv:2407.21783, 2024.
[40] Rei R, De Souza J G, Alves D, et al. COMET-22: Unbabel-IST 2022 submission for the metrics shared task[C]. Proceedings of the Seventh Conference on Machine Translation (WMT), 2022: 578-585.

Please choose a citation manager

Content to export