作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 1-16. doi: 10.19678/j.issn.1000-3428.0068124

• 热点与综述 • 上一篇    下一篇

维吾尔语机器翻译研究综述

哈里旦木·阿布都克里木1, 侯钰涛1, 姚登峰2,*(), 阿布都克力木·阿布力孜1, 陈吉尚1   

  1. 1. 新疆财经大学信息管理学院, 新疆 乌鲁木齐 830012
    2. 北京联合大学信息服务工程重点实验室, 北京 100101
  • 收稿日期:2023-07-20 出版日期:2024-01-15 发布日期:2023-10-27
  • 通讯作者: 姚登峰
  • 基金资助:
    国家自然科学基金(61966033); 国家自然科学基金(62366050); 国家社会科学基金(21BYY106); 国家语委一般项目(YB145-25)

Survey of Uyghur Machine Translation Research

Halidanmu ABUDUKELIMU1, Yutao HOU1, Dengfeng YAO2,*(), Abudukelimu ABULIZI1, Jishang CHEN1   

  1. 1. School of Information Management, Xinjiang University of Finance and Economics, Urumqi 830012, Xinjiang, China
    2. Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
  • Received:2023-07-20 Online:2024-01-15 Published:2023-10-27
  • Contact: Dengfeng YAO

摘要:

维吾尔语机器翻译作为我国低资源机器翻译研究的重要任务之一,其发展与应用可以更好地促进不同地区和民族之间的文化交流与贸易往来。然而,维吾尔语作为一种黏着性语言,在机器翻译领域存在形态复杂、语料稀缺等问题。近年来,在维吾尔语机器翻译发展的不同阶段,研究人员针对其特点在算法和模型上不断优化与创新,取得了一定的研究成果,但缺乏系统性的综述。全面回顾维吾尔语机器翻译的相关研究,并根据方法的不同将其分为基于规则和实例的维吾尔语机器翻译、基于统计的维吾尔语机器翻译以及基于神经网络的维吾尔语机器翻译3种类型,同时对相关学术活动和语料库资源进行汇总。为进一步探索维吾尔语机器翻译的潜力,采用ChatGPT模型对维吾尔语-汉语机器翻译任务进行初步研究,实验结果表明,在Few-shot情景下,随着示例数的增加,翻译性能先升后降,在10-shot时表现最佳。此外,思维链方法在维吾尔语机器翻译任务中并未展示出更优的翻译能力。最后对维吾尔语机器翻译未来的研究方向进行了展望。

关键词: 维吾尔语, 基于规则和实例的机器翻译, 统计机器翻译, 神经机器翻译, 大语言模型

Abstract:

As one of the important tasks in China's low-resource machine translation research, the development and application of Uyghur machine translation can better promote cultural exchanges and trade between different regions and ethnic groups.However, Uyghur, as an adhesive language, has problems such as complex morphology and a scarce corpus in the field of machine translation. In recent years, at different stages of the development of Uyghur machine translation, researchers have optimized and innovated algorithms and models to address its characteristics and achieved various research results; however, no systematic review has been conducted. The paper comprehensively reviews the related research on Uyghur machine translation and categorizes it into three types according to methods used: rule- and example-based Uyghur machine translation, statistics-based Uyghur machine translation, and neural network-based Uyghur machine translation. Related academic activities and corpus resources are also summarized. To further explore the potential of Uyghur machine translation, the ChatGPT model is adopted as a preliminary attempt of the Uyghur-Chinese machine translation task.The experimental results show that in the Few-shot scenario, the translation performance is higher and then decreases with an increase in the number of examples, and the best performance is for 10-shot. Also, the chain-of-thought approach does not demonstrate better translation ability in the Uyghur machine translation task. Finally, future research directions for Uyghur machine translation are proposed.

Key words: Uyghur, rule- and example-based machine translation, statistical machine translation, Neural Machine Translation(NMT), Large Language Model(LLM)