Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 160-167. doi: 10.19678/j.issn.1000-3428.0069057

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Research on Cross-Language Summarization in Chinese—Burmese—Vietnamese Based on Enhanced Linguistic Relationships

HE Zhilei1,2, GAO Shengxiang1,2,*(), ZHU Enchang1,2, YU Zhengtao1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2023-12-19 Revised:2024-03-27 Online:2025-08-15 Published:2024-06-14
  • Contact: GAO Shengxiang

基于强化语言关联的中缅越跨语言摘要研究

何志磊1,2, 高盛祥1,2,*(), 朱恩昌1,2, 余正涛1,2   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500
    2. 昆明理工大学云南省人工智能重点实验室, 云南 昆明 650500
  • 通讯作者: 高盛祥
  • 基金资助:
    国家自然科学基金(U23A20388); 国家自然科学基金(U21B2027); 云南省重点研发计划(202303AP140008); 云南省重点研发计划(202302AD080003); 昆明理工大学"双一流"科技专项(202402AG050007); 云南省基础研究项目(202301AT070393); 昆明理工大学"双一流"创建联合专项(202201BE070001-021)

Abstract:

Cross-Language Summarization (CLS) involves condensing and summarizing the core content of text in a source language (such as Burmese) into text in the target language (e.g., Chinese). CLS is a combination of Machine Translation (MT) and Monolingual Summarization (MS) and requires the model to possess capabilities in both areas. When dealing with low-resource language scenarios, such as those in the case of Vietnamese and Burmese, CLS faces the challenge of scarce training data. Moreover, given the significant linguistic disparities between Chinese and languages such as Burmese or Vietnamese, which belong to different language families, current CLS methods often exhibit poor generalization performance. To address this, taking the Myanmar—Chinese and Vietnamese—Chinese language pairs as research subjects, we propose a language relationship-enhanced CLS approach. First, input sequences are transformed into consecutive word pairs. Then, the relationships between these consecutive word pairs in the source and target languages are calculated. Finally, a joint training method that integrates MT and MS is introduced to effectively capture the relationships between the target and source languages, which improves the model's generalization ability and capacity to handle continuous text. Experiments conducted on a self-constructed dataset demonstrate that, compared to other baseline models, our proposed method achieves improvements of 5, 1, and 4 percentage points in the ROUGE-1, ROUGE-2, and ROUGE-L evaluation metrics, respectively.

Key words: Cross-Language Summarization (CLS), low resource language, language differences, continuous text, generalization ability

摘要:

跨语言摘要(CLS)旨在用目标语言(如中文)的文本对源语言(如缅甸语)的文本核心内容进行概括和总结。CLS实质上是机器翻译(MT)和单语摘要(MS)的联合任务, 需要模型同时具备这两方面的能力。在面向越南语、缅甸语等低资源语言场景时, CLS训练数据稀缺, 且中文与缅甸语、越南语属于不同的语系, 语言差异较大, 导致当前的CLS方法泛化性较差。为此, 以缅-中、越-中为研究对象, 提出一种语言关系增强的CLS方法。首先将输入序列转化为连续词对; 然后计算源语言和目标语言之间的连续词对之间的关系; 最后引入MT和MS的联合训练方法, 有效地捕捉目标语言和源语言之间的关系, 提高模型的泛化性和对连续文本的处理能力。在自建数据集上进行实验, 结果表明, 相较其他基线模型, 该方法在ROUGE-1、ROUGE-2和ROUGE-L评价指标上分别提升了5、1、4百分点。

关键词: 跨语言摘要, 低资源语言, 语言差异, 连续文本, 泛化性