面向汉维机器翻译的BERT嵌入研究

doi:10.19678/j.issn.1000-3428.0059863

计算机工程 ›› 2021, Vol. 47 ›› Issue (12): 112-117. doi: 10.19678/j.issn.1000-3428.0059863

面向汉维机器翻译的BERT嵌入研究

陈玺^1,2,3, 杨雅婷^1,2,3, 董瑞^1,2,3

1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011;
2. 中国科学院大学, 北京 100049;
3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011

收稿日期:2020-10-28 修回日期:2020-12-02 发布日期:2020-12-08
作者简介:陈玺(1995-),男,硕士研究生,主研方向为自然语言处理、机器翻译;杨雅婷,研究员、博士;董瑞,副研究员、博士。
基金资助:
国家自然科学基金“融合复杂形态特征的多语言神经机器翻译研究”（U1703133）；国家重点研发计划“维吾尔语、哈萨克语到汉语的机器翻译研究”（2017YFC0822505-04）；新疆高层次引进人才项目（新人社函［2017］699号）；中国科学院“西部之光”人才培养计划A类项目“以和田墨玉为例的维汉翻译关键技术研究”（2017-XBQNXZ-A-005）。

Research on BERT Embedding for Chinese-Uyghur Machine Translation

CHEN Xi^1,2,3, YANG Yating^1,2,3, DONG Rui^1,2,3

1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China

Received:2020-10-28 Revised:2020-12-02 Published:2020-12-08

摘要/Abstract

摘要： 针对训练汉维机器翻译模型时汉语-维吾尔语平行语料数据稀疏的问题，将汉语预训练语言BERT模型嵌入到汉维神经机器翻译模型中，以提高汉维机器翻译质量。对比不同汉语BERT预训练模型编码信息的嵌入效果，讨论BERT不同隐藏层编码信息对汉维神经机器翻译效果的影响，并提出一种两段式微调BERT策略，通过对比实验总结出将BERT模型应用在汉维神经机器翻译中的最佳方法。在汉维公开数据集上的实验结果显示，通过该方法可使机器双语互译评估值（BLEU）提升1.64，有效提高汉维机器翻译系统的性能。

关键词: 汉维翻译, 神经机器翻译, 预训练语言模型, BERT模型, 两段式微调策略

Abstract: The Chinese-Uyghur parallel corpus required for training Chinese-Uyghur machine translation models suffer from data sparsity.To address the problem, this paper embeds the Chinese pre-trained language BERT model into a Chinese-Uyghur neural machine translation model to improve the quality of translation.This research compares the embedding effects of coding information of different Chinese BERT pre-trained models, explores the influence of the coding information at different hidden layers of Chinese BERT on Chinese-Uyghur neural machine translation, and on this basis proposes a two-stage BERT fine-tuning strategy.By comparative experiments, this paper summarizes the best method of applying the BERT model to the Chinese-Uyghur neural machine translation.The experimental results on the Chinese-Uyghur public dataset show that the proposed model increases the BLEU value by 1.64, and significantly improves the performance of the Chinese-Uyghur machine translation system.

Key words: Chinese-Uyghur translation, Neural Machine Translation(NMT), pre-trained language model, BERT model, two-stage fine-tuning strategy

中图分类号:

TP18

陈玺, 杨雅婷, 董瑞. 面向汉维机器翻译的BERT嵌入研究[J]. 计算机工程, 2021, 47(12): 112-117.

CHEN Xi, YANG Yating, DONG Rui. Research on BERT Embedding for Chinese-Uyghur Machine Translation[J]. Computer Engineering, 2021, 47(12): 112-117.

https://www.ecice06.com/CN/Y2021/V47/I12/112

图/表 5

20211213182101

20211213182105

20211213182108

20211213182111

20211213182115

参考文献

[1] SUTSKEVER I, VINYALS O, QUOC V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2014:3104-3112.
[2] BAHDANAU D, CHO K, BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].(2016-05-19)[2020-09-10].https://arxiv.org/pdf/1409.0473.pdf.
[3] MENG F, ZHANG J.DTMT:a novel deep transition architecture for neural machine translation[C]//Proceedings of 2019 AAAI Conference on Artificial Intelligence.[S.l.]:AAAI Press, 2019:224-231.
[4] GEHRING J, AULI M, GRANGIER D, et al.Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning.New York, USA:ACM Press, 2017:1243-1252.
[5] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2017:6000-6010.
[6] 哈里旦木·阿布都克里木, 刘洋, 孙茂松.神经机器翻译系统在维吾尔语汉语翻译中的性能对比[J].清华大学学报(自然科学版), 2017, 57(8):878-883. ABUDUKELIMU H, LIU Y, SUN M S.Performance comparison of neural machine translation systems in Uyghur-Chinese translation[J].Journal of Tsinghua University(Science and Technology), 2017, 57(8):878-883.(in Chinese)
[7] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].(2019-05-24)[2020-09-10].https://arxiv.org/pdf/1810.04805.pdf.
[8] LIU Y, OTT M, GOYAL N, et al.RoBERTa:a robustly optimized BERT pretraining approach[EB/OL].(2019-07-26)[2020-09-10].https://arxiv.org/pdf/1907.11692v1.pdf.
[9] RADFORD A, WU J, CHILD R, et al.Language models are unsupervised multitask learners[J].OpenAI Blog, 2019, 1(8):9.
[10] 李俊, 吕学强.融合BERT语义加权与网络图的关键词抽取方法[J].计算机工程, 2020, 46(9):89-94. LI J, LÜ X Q.Keyword extraction method based on BERT semantic weighting and network graph[J].Computer Engineering, 2020, 46(9):89-94.(in Chinese)
[11] RAJPURKAR P, JIA R, LIANG P.Know what you don't know:unanswerable questions for SQuAD[EB/OL].(2018-06-11)[2020-09-10].https://arxiv.org/pdf/1806.03822.pdf.
[12] ZHANG H, XU J, WANG J.Pretraining-based natural language generation for text summarization[EB/OL].(2019-02-25)[2020-09-10].https://arxiv.org/pdf/1902.09243v2.pdf.
[13] CLINCHANT S, JUNG K W, NIKOULINA V.On the use of BERT for neural machine translation[EB/OL].(2019-09-27)[2020-09-10].https://arxiv.org/pdf/1909.12744.pdf.
[14] LI L, JIANG X, LIU Q.Pretrained language models for document-level neural machine translation[EB/OL].(2019-11-08)[2020-09-10].https://arxiv.org/pdf/1911.03110.pdf.
[15] ZHU J, XIA Y, WU L, et al.Incorporating BERT into neural machine translation[EB/OL].(2020-02-17)[2020-09-10].https://arxiv.org/pdf/2002.06823.pdf.
[16] BERT-base-multilingual-uncased model[EB/OL].[2020-09-10].https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip.
[17] BERT-base-Chinese model[EB/OL].[2020-09-10].https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip.
[18] BERT-wwm-ext model[EB/OL].[2020-09-10].https://drive.google.com/file/d/1iNeYFhCBJWeUsIlnW_2K6SMwXkM4gLb_/view.
[19] CUI Y, CHE W, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].(2020-02-17)[2020-09-10].https://arxiv.org/pdf/1906.08101v2.pdf.
[20] RoBERTa-wwm-large-ext model[EB/OL].[2020-09-10].https://drive.google.com/open?id=1-2vEZfIFCdM1-vJ3GD6DlSyKT4eVXMKq.
[21] RoBERTa-wwm-ext model[EB/OL].[2020-09-10].https://drive.google.com/open?id=1eHM3l4fMo6DsQYGmey7UZGiTmQquHw25.
[22] RBTL3 model[EB/OL].[2020-09-10].https://drive.google.com/open?id=1qs5OasLXXjOnR2XuGUh12NanUl0pkjEv.
[23] JAWAHAR G, SAGOT B, SEDDAH D.What does BERT learn about the structure of language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics, 2019:3651-3657.
[24] McCLOSKEY M, COHEN N J.Catastrophic interference in connectionist networks:the sequential learning problem[J].Psychology of Learning and Motivation.1989, 24:109-165.
[25] SENNRICH R, HADDOW B, BIRCH A.Neural machine translation of rare words with subword units[EB/OL].(2016-06-03)[2020-09-10].https://arxiv.org/pdf/1508.07909v4.pdf.
[26] OTT M, EDUNOV S, BAEVSKI A, et al.fairseq:a fast, extensible toolkit for sequence modeling[EB/OL].(2019-04-01)[2020-09-10].https://arxiv.org/pdf/1904.01038.pdf.
[27] PAPINENI K, ROUKOS S, WARD T, et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.[S.l.]:Association for Computational Linguistics, 2002:311-318.

选择文件类型/文献管理软件名称

选择包含的内容

面向汉维机器翻译的BERT嵌入研究

Research on BERT Embedding for Chinese-Uyghur Machine Translation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陈宇航, 杨勇, 先木斯亚·买买提明, 帕力旦·吐尔逊, 樊小超, 任鸽, 刁宇峰. 基于主题感知和语义增强的作文自动评分方法[J]. 计算机工程, 2024, 50(8): 363-371.
[2]	刘娟, 段友祥, 陆誉翕, 张鲁. 引入知识增强和对比学习的知识图谱补全[J]. 计算机工程, 2024, 50(7): 112-122.
[3]	陈佳玉, 王元龙, 张虎. 基于文本知识增强的问题生成模型[J]. 计算机工程, 2024, 50(6): 86-93.
[4]	隗昊, 刁宏悦, 孔亮宸, 邓耀臣. 东北亚舆情文本细粒度命名实体识别方法研究[J]. 计算机工程, 2024, 50(5): 354-362.
[5]	隗昊, 刁宏悦, 孔亮宸, 邓耀臣. 东北亚舆情文本细粒度命名实体识别方法研究[J]. 计算机工程, 2024, 50(5): 354-362.
[6]	张洪程, 李林育, 杨莉, 伞晨峻, 尹春林, 颜冰, 于虹, 张璇. 基于对比学习与语言模型增强嵌入的知识图谱补全[J]. 计算机工程, 2024, 50(4): 168-176.
[7]	施竣潇, 陈艳平, 穆肇南. 融合多尺度跨度特征的谓语中心词识别模型[J]. 计算机工程, 2024, 50(10): 137-144.
[8]	哈里旦木·阿布都克里木, 侯钰涛, 姚登峰, 阿布都克力木·阿布力孜, 陈吉尚. 维吾尔语机器翻译研究综述[J]. 计算机工程, 2024, 50(1): 1-16.
[9]	李鸿鹏, 马博, 杨雅婷, 王磊, 王震, 李晓. 基于槽位语义增强提示学习的篇章级事件抽取方法[J]. 计算机工程, 2023, 49(9): 23-31.
[10]	毛亮, 赵林均, 余敦辉, 孙斌. 基于知识蒸馏的企业命名实体识别模型[J]. 计算机工程, 2023, 49(5): 90-96.
[11]	戎珂瑶, 熊贇. 基于多维度异质图结构的代码注释自动生成[J]. 计算机工程, 2023, 49(4): 240-248.
[12]	陈柏霖, 王天极, 任丽娜, 黄瑞章. 融合ELECTRA和文本局部信息的中文语法错误检测方法[J]. 计算机工程, 2023, 49(3): 304-311.
[13]	邓倩, 陈曙, 叶俊民. 基于语法知识增强的中文语法纠错[J]. 计算机工程, 2023, 49(11): 77-84.
[14]	禹克强, 黄芳, 吴琪, 欧阳洋. 基于双向语义的中文实体关系联合抽取方法[J]. 计算机工程, 2023, 49(1): 92-99,112.
[15]	黄君扬, 王振宇, 梁家卿, 肖仰华. 基于自裁剪异构图的NL2SQL模型[J]. 计算机工程, 2022, 48(9): 71-77,88.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

面向汉维机器翻译的BERT嵌入研究

Research on BERT Embedding for Chinese-Uyghur Machine Translation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献

相关文章 15

编辑推荐

Metrics

本文评价