基于词对建模的句子对齐研究

doi:10.19678/j.issn.1000-3428.0051060

摘要/Abstract

摘要：

句子对齐是将源文本中的句子映射到目标文本中对应翻译的过程。在神经网络的框架下,基于相互对齐的源端和目标端句子中包含大量相互对齐的单词,提出一种句子对齐方法。使用门关联网络捕获源端句子和目标端句子词对之间的语义关系,并通过语义关系来确定源端句子和目标端句子是否对齐。对非单调文本进行对齐评估,结果表明,该方法F1值达到93.8%,有效提高了句子对齐的准确率。

关键词: 句子对齐, 词对, 双向循环神经网络, 门关联网络, 语义关系

Abstract:

Sentence alignment is a process mapping sentences in the source text to their counterparts in the target text.Within the framework of neural network,this paper proposes a sentence alignment method,on the basis that the aligned source sentence and target sentence pair contains a large number of aligned words.The Gated Relevance Network (GRN) is used to capture the semantic interaction between the source sentence and the target sentence pair,and the semantic interaction is used to determine whether the source sentence and the target sentence are aligned.The alignment evaluation of non-monotonic text shows that the F1 value of the method reaches 93.8%,which effectively improves the accuracy of sentence alignment.

Key words: sentence alignment, word pairs, Bidirectional Recurrent Neural Network(Bi-RNN), Gated Relevance Network(GRN), semantic interaction

中图分类号:

TP391

丁颖,李军辉,周国栋. 基于词对建模的句子对齐研究[J]. 计算机工程, 2019, 45(6): 211-217.

DING Ying,LI Junhui,ZHOU Guodong. Research on sentence alignment based on modeling word pairs[J]. Computer Engineering, 2019, 45(6): 211-217.

https://www.ecice06.com/CN/Y2019/V45/I6/211

参考文献 26

［1］	DEVLIN J,ZBIB R,HUANG Zhongqiang,et al.Fast and robust neural network joint models for statistical machine translation［C］//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,USA:［s.n.］,2014:1370-1380.
［2］	VOGEL S,TRIBBLE A.Improving statistical machine translation for a speech-to-speech translation task［C］//Proceedings of the 7th International Conference on Spoken Language Processing.Denver,USA:［s.n.］,2002:1901-1904.
［3］	KRAAIJ W,NIE Jianyun,SIMARD M.Embedding web-based statistical translation models in cross-language information retrieval［J］.Computational Linguistics,2003,29(3):381-419.
［4］	NIE Jianyun,SIMARD M,ISABELLE P,et al.Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web［C］//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACM Press,1999:74-81.
［5］	KLAVANS J,TZOUKCRMANN E.The BICORD system:combining lexical information from bilingual corpora and machine readable dictionaries［J］.Computational Linguistics,1990,62(4):174-179.
［6］	张霞,昝红英,张恩展.汉英句子对齐长度计算方法的研究［J］.计算机工程与设计,2009,30(18):4356-4358.
［7］	GALE W A,CHURCHK W.A program for aligning sentences in bilingual corpora［J］.Computational Linguistics,1993,19(1):75-102.
［8］	BROWN P F,LAI J C,MERCERR L.Aligning sentences in parallel corpora［C］//Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,1991:169-176.
［9］	KAY M,ROSCHEISEN M.Text-translation alignment［J］.Computational Linguistics,1993,19(1):121-142.
［10］	刘昕,周明,朱胜火,等.基于自动抽取词汇信息的双语句子对齐［J］.计算机学报,1998,21(增刊):151-158.
［11］	李维刚,刘挺,张宇,等.基于长度和位置信息的双语句子对齐方法［J］.哈尔滨工业大学学报,2006,38(5):689-692.
［12］	MOORE R C.Fast and accurate sentence alignment of bilingual corpora［C］//Proceedings of the Association for Machine Translation in the Americas on Machine Translation.Berlin,Germany:Springer,2002:135-144.
［13］	GREGOIRE F,LANGLAIS P.A deep neural network approach to parallel sentence extraction［EB/OL］.［2018-03-28］.https://arxiv.org/abs/1709.09783.
［14］	BROWN P F,PIETRA S A D,PIETRA V J D,et al.The mathematics of statistical machine translation:parameter estimation［J］.Computational Linguistics,1993,19(2):263-311.
［15］	BRAUNE F,FRASER A.Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora［C］//Proceedings of the 23rd Inter-national Conference on Computational Linguistics.New York,USA:ACM Press,2010:81-89.
［16］	MA Xiaoyi.Champollion:a robust parallel text sentence aligner［C］//Proceedings of International Conference on Language Resources and Evaluation.Genoa,Italy:［s.n.］,2006:489-492.
［17］	LI Peng,SUN Maosong,XUE Ping.Fast-champollion:a fast and robust sentence alignment algorithm［C］//Proceedings of International Conference on Computa-tional Linguistics.New York,USA:ACM Press,2010:710-718.
［18］	QUAN Xiaojun,KIT C,SONG Yan.Non-monotonic sentence alignment via semisupervised learning［C］//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2013:622-630.
［19］	MJDRICZA-MAYDT E,KRKEL-QU H,RIEZLER S,et al.High-precision sentence alignment by bootstrapping from wood standard annotations［J］.Prague Bulletin of Mathematical Linguistics,2013,99(1):5-16.
［20］	GROVER J,MITRA P.Bilingual word embeddings with bucketed CNN for parallel sentence extraction［C］//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-Student Research Workshop.Vancouver,Canada:［s.n.］,2017:11-16.
［21］	SUTSKEVER I,SALAKHUTDINOV R,TENENBAUMJ B.Modelling relational data using bayesian clustered tensor factorization［C］//Proceedings of Advances in Neural Information Processing Systems.［S.l.］:Neural Information Processing Systems Foundation,Inc.,2009:1821-1828.
［22］	JENATTON R,LE ROUX N,BORDES A,et al.A latent factor model for highly multi-relational data［C］//Proceedings of International Conference on Neural Information Processing Systems.［S.l.］:Neural Information Processing Systems Foundation,Inc.,2012:3167-3175.
［23］	COLLOBERT R,WESTON J.A unified architecture for natural language processing:deep neural networks with multitask learning［C］//Proceedings of the 25th International Conference on Machine Learning.New York,USA:ACM Press,2008:160-167.
［24］	CHEN Jifan,ZHANG Qi,LIU Pengfei,et al.Implicit discourse relation detection via a deep architecture with gated relevance network［C］//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2016:1726-1735.
［25］	SOCHER R,CER D.Bilingual word embeddings for phrase based machine translation［C］//Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing.Seattle,USA:［s.n.］,2013:1393-1398.
［26］	ZEILER M D.ADADELTA:an adaptive learning rate method［EB/OL］.［2018-03-28］.https://arxiv.org/abs/1212.5701.

[1]	翁裕源, 许柏炎, 蔡瑞初. 基于异构图分层学习的细粒度多文档摘要抽取[J]. 计算机工程, 2024, 50(3): 336-344.
[2]	李俊, 吕学强. 融合BERT语义加权与网络图的关键词抽取方法[J]. 计算机工程, 2020, 46(9): 89-94.
[3]	王亚娟,李晓,杨雅婷,米成刚. 基于释义信息的维汉机器翻译系统融合研究[J]. 计算机工程, 2019, 45(4): 288-295,301.
[4]	张芬,孔祥维,宁斐,贾则. 基于网络搜索量的扩展属性图像检索[J]. 计算机工程, 2017, 43(9): 276-280,287.
[5]	郭竹为,刘胜全,刘艳,赵美玲,符贤哲. 基于最大公共子图的本体映射方法研究[J]. 计算机工程, 2017, 43(5): 197-203,209.
[6]	夏翠翠,刘梦赤,胡婕. 基于信息网模型的Web实体语义信息搜索平台[J]. 计算机工程, 2017, 43(3): 18-23,31.
[7]	塞麦提·麦麦提敏,侯敏,吐尔根·伊布拉音. 基于锚点句对的汉维句子对齐方法[J]. 计算机工程, 2015, 41(4): 166-170.
[8]	吴晓芳,杨志豪,林鸿飞,王健. 基于语义关系的疾病知识提取系统[J]. 计算机工程, 2015, 41(1): 284-288.
[9]	姚双云, 胡金柱, 舒江波, 沈威. 篇章连贯语义关系的自动标注方法[J]. 计算机工程, 2012, 38(7): 131-133.
[10]	王正鹏, 谢志鹏, 邱培超. 语义关系相似度计算中的数据标准化方法比较[J]. 计算机工程, 2012, 38(10): 38-40.
[11]	冯礼;李芳;盛焕烨. 基于词对特征的事件新侧面探测[J]. 计算机工程, 2009, 35(3): 45-47,4.
[12]	吴中勤;黄萱菁;吴立德. 基于语义关系三元组的问答式文摘[J]. 计算机工程, 2008, 34(6): 194-195.

选择文件类型/文献管理软件名称

选择包含的内容