作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (5): 86-93. doi: 10.19678/j.issn.1000-3428.0054793

• 人工智能与模式识别 • 上一篇    下一篇

基于对比注意力机制的跨语言句子摘要系统

殷明明, 史小静, 俞鸿飞, 段湘煜   

  1. 苏州大学 自然语言处理实验室, 江苏 苏州 215006
  • 收稿日期:2019-04-30 修回日期:2019-07-21 发布日期:2019-08-06
  • 作者简介:殷明明(1994-),男,硕士研究生,主研方向为自然语言处理、机器翻译;史小静、俞鸿飞,硕士研究生;段湘煜(通信作者),副教授、博士。
  • 基金资助:
    国家自然科学基金(61673289);国家重点研发计划政府间国际科技创新合作重点专项(2016YFE0132100)。

Cross-Lingual Sentence Summarization System Based on Contrastive Attention Mechanism

YIN Mingming, SHI Xiaojing, YU Hongfei, DUAN Xiangyu   

  1. Natural Language Processing Laboratory, Soochow University, Suzhou, Jiangsu 215006, China
  • Received:2019-04-30 Revised:2019-07-21 Published:2019-08-06

摘要: 当今句子摘要研究主要针对单语,即源端句子和目标端摘要短语属于同种语言,然而单语句子摘要严重制约了不同语言文本信息的快速获取。为解决该问题,提出一种跨语言句子摘要系统。借鉴回译思想,将单语句子摘要平行语料中的源端通过神经机器翻译系统翻译成另一种语言,将其与句子摘要平行语料中目标端的摘要短语共同构成跨语言的伪平行语料。在此基础上,利用对比注意力机制,实现目标端与源端序列中不相关信息的获取,解决了传统注意力机制中源端和目标端句子长度不匹配的问题。实验结果表明,与基于管道方法的单语句子摘要系统相比,该跨语言系统生成的摘要短语更流畅且符合人类语言表述方式,可达到接近单语的句子摘要水平。

关键词: 跨语言句子摘要, 平行语料, 伪语料, 对比注意力机制, 管道方法

Abstract: Nowadays,research in sentence summarization mainly focuses on monolingual materials,which means the source sentences and the target summarized phrases are in the same language,reducing the availability of information from texts in different languages.To solve the problem,this paper proposes a cross-lingual sentence summarization system.The system borrows the idea of back translation,using the neural machine translation system to translate the source end of parallel corpus of monolingual sentence summarization into another language.Then the translation is combined with summarized phrases in the target end of the parallel corpus of sentence summarization to construct a cross-lingual pseudo parallel corpus.On this basis,the contrastive attention mechanism is used to obtain most irrelevant information from the sequences of the source end and target end,solving the mismatching of lengths of source sentences and target sentences in the traditional attention mechanism.Experimental results show that compared with pipeline-based monolingual sentence summarization systems,the proposed cross-lingual system can generate more fluent summarized phrases that match the representation of human languages and are closer to the level of monolingual sentence summarization.

Key words: cross-lingual sentence summarization, parallel corpus, pseudo corpus, contrastive attention mechanism, pipeline method

中图分类号: