作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (2): 309-314. doi: 10.19678/j.issn.1000-3428.0053080

• 开发研究与工程应用 • 上一篇    下一篇

维吾尔语复杂形态对汉维机器翻译的影响研究

穆妮热·穆合塔尔1,2,3, 李晓1,2, 杨雅婷1,2   

  1. 1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011;
    2. 中国科学院大学, 北京 100049;
    3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
  • 收稿日期:2018-11-07 修回日期:2019-03-04 发布日期:2019-03-14
  • 作者简介:穆妮热·穆合塔尔(1989-),女,博士研究生,主研方向为自然语言处理、机器翻译;李晓(通信作者),研究员、博士生导师;杨雅婷,副研究员、博士。
  • 基金资助:
    国家自然科学基金(U1703133);中科院西部之光人才培养引进计划(2017-XBQNXZ-A-005);中国科学院青年创新促进会项目(2017472);新疆维吾尔自治区重大科技专项(2016A03007-3);新疆维吾尔自治区高层次人才引进工程(Y839031201)。

Research on Influence of Uyghur Complex Morphology on Chinese-Uyghur Machine Translation

MUNIRE·Muhetare1,2,3, LI Xiao1,2, YANG Yating1,2   

  1. 1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
  • Received:2018-11-07 Revised:2019-03-04 Published:2019-03-14

摘要: 维吾尔语形态较为复杂,构形词缀在维吾尔语中占有重要地位,其语法与汉语有较大差别。针对维吾尔语的形态特点,分析汉语端到维吾尔语端在统计机器翻译中维吾尔语词缀的作用,搭建基于短语的汉维统计机器翻译系统,对词级粒度、词干级粒度、最大词干级粒度、词干-词缀级粒度、词干-词尾级粒度的汉维平行语料库进行对比实验,研究不同粒度的维吾尔语对汉维机器翻译中的词语对齐质量和语言模型质量的影响。实验结果表明,在上述5种粒度的维吾尔语语料中,基于词干的维吾尔语和基于词干-词尾的维吾尔语目标端语料的翻译质量明显提高。

关键词: 维吾尔语形态, 构形词缀, 词缀粒度, 统计机器翻译, 翻译质量

Abstract: The Uyghur morphology is comparatively complex and the configuration affix plays a significant role in Uyghur,which is grammatically very different from Chinese.Aiming at the morphology characteristics of Uyghur,this paper analyzes the function of Uyghur affix in statistical machine translation from Chinese to Uyghur.A phrase-based Chinese-Uyghur statistical translation system is built to conduct comparative experiments on Chinese-Uyghur parallel corpus with different levels of granularity,such as the word level granularity,the stem level granularity,the maximum stem level granularity,the stem-affix level granularity and the stem-suffix level granularity.Then the influence of Uyghur with different granularity on words alignment quality and language model quality in Chinese-Uyghur machine translation is studied.Experimental results show that the translation quality of the stem-based and the stem-suffix based Uyghur target corpus is significantly improved.

Key words: Uyghur morphology, configuration affix, affix granularity, statistical machine translation, translation quality

中图分类号: