作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (3): 169-171,175. doi: 10.3969/j.issn.1000-3428.2012.03.057

• 人工智能及识别技术 • 上一篇    下一篇

基于句法调序的汉维统计机器翻译

陈丽娟1,2,张 恒1,2,董兴华1,2,吐尔洪•吾司曼1,周俊林3   

  1. (1. 中国科学院新疆理化技术研究所,乌鲁木齐 830011;2. 中国科学院研究生院,北京100049;3. 中国科学院新疆分院,乌鲁木齐 830011)
  • 收稿日期:2011-07-25 出版日期:2012-02-05 发布日期:2012-02-05
  • 作者简介:陈丽娟(1985-),女,硕士研究生,主研方向:自然语言处理;张 恒,硕士研究生;董兴华,博士研究生;吐尔洪?吾司曼,助理研究员;周俊林,研究员
  • 基金资助:
    中国科学院西部行动计划高新技术基金资助项目(KGC X2-YN-507)

Chinese-Uyghur Statistical Machine Translation Based on Syntactical Reordering

CHEN Li-juan 1,2, ZHANG Heng 1,2, DONG Xing-hua 1,2, Turghun Osman 1, ZHOU Jun-lin 3   

  1. (1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; 2. Graduate University of Chinese Academy of Science, Beijing 100049, China; 3. Xinjiang Branch of Chinese Academy of Sciences, Urumqi 830011, China)
  • Received:2011-07-25 Online:2012-02-05 Published:2012-02-05

摘要: 在汉语到维语的统计机器翻译中,2种语言在形态学及语序上差异较大,导致未知词较多,且产生的维语译文语序混乱。针对上述问题,在对汉语和维语的语序进行研究的基础上,提出一种汉语句法调序方法,进而对维语进行形态学分析,采用基于因素的统计机器翻译系统进行验证。实验结果证明,该方法在性能上较基线系统有显著改进,BLEU评分由15.72提高到19.17。

关键词: 统计机器翻译, 句法调序, 形态学, 因素模型, 翻译模型

Abstract: Chinese and Uyghur are very different in terms of morphological typology and word order, which leads to many unknown words and confusion word order in Uyghur when translate from Chinese to Uyghur using statistical method. On the basis of the word order of Chinese and Uyghur, a Chinese syntactic reordering method is proposed, and an analysis on Uyghur morphological information is made to resolve the difficulties. Experimental results on the factor-based SMT show that the approach achieves a substantial improvement in translation quality over the baseline phrase-based system, and the BLEU score is improved from 15.72 to 19.17.

Key words: Statistical Machine Translation(SMT), syntactical reordering, morphological, factored model, translation model

中图分类号: