Research on Cross-Language Summarization in Chinese—Burmese—Vietnamese Based on Enhanced Linguistic Relationships

doi:10.19678/j.issn.1000-3428.0069057

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 160-167. doi: 10.19678/j.issn.1000-3428.0069057

• Artificial Intelligence and Pattern Recognition • Previous Articles Next Articles

Research on Cross-Language Summarization in Chinese—Burmese—Vietnamese Based on Enhanced Linguistic Relationships

HE Zhilei¹^,², GAO Shengxiang¹^,²^,*(), ZHU Enchang¹^,², YU Zhengtao¹^,²

1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, Yunnan, China

Received:2023-12-19 Revised:2024-03-27 Online:2025-08-15 Published:2024-06-14
Contact: GAO Shengxiang

基于强化语言关联的中缅越跨语言摘要研究

何志磊¹^,², 高盛祥¹^,²^,*(), 朱恩昌¹^,², 余正涛¹^,²

1. 昆明理工大学信息工程与自动化学院, 云南昆明 650500
2. 昆明理工大学云南省人工智能重点实验室, 云南昆明 650500

通讯作者: 高盛祥
基金资助:
国家自然科学基金(U23A20388); 国家自然科学基金(U21B2027); 云南省重点研发计划(202303AP140008); 云南省重点研发计划(202302AD080003); 昆明理工大学"双一流"科技专项(202402AG050007); 云南省基础研究项目(202301AT070393); 昆明理工大学"双一流"创建联合专项(202201BE070001-021)

Abstract

Abstract:

Cross-Language Summarization (CLS) involves condensing and summarizing the core content of text in a source language (such as Burmese) into text in the target language (e.g., Chinese). CLS is a combination of Machine Translation (MT) and Monolingual Summarization (MS) and requires the model to possess capabilities in both areas. When dealing with low-resource language scenarios, such as those in the case of Vietnamese and Burmese, CLS faces the challenge of scarce training data. Moreover, given the significant linguistic disparities between Chinese and languages such as Burmese or Vietnamese, which belong to different language families, current CLS methods often exhibit poor generalization performance. To address this, taking the Myanmar—Chinese and Vietnamese—Chinese language pairs as research subjects, we propose a language relationship-enhanced CLS approach. First, input sequences are transformed into consecutive word pairs. Then, the relationships between these consecutive word pairs in the source and target languages are calculated. Finally, a joint training method that integrates MT and MS is introduced to effectively capture the relationships between the target and source languages, which improves the model's generalization ability and capacity to handle continuous text. Experiments conducted on a self-constructed dataset demonstrate that, compared to other baseline models, our proposed method achieves improvements of 5, 1, and 4 percentage points in the ROUGE-1, ROUGE-2, and ROUGE-L evaluation metrics, respectively.

Key words: Cross-Language Summarization (CLS), low resource language, language differences, continuous text, generalization ability

摘要：

跨语言摘要(CLS)旨在用目标语言(如中文)的文本对源语言(如缅甸语)的文本核心内容进行概括和总结。CLS实质上是机器翻译(MT)和单语摘要(MS)的联合任务, 需要模型同时具备这两方面的能力。在面向越南语、缅甸语等低资源语言场景时, CLS训练数据稀缺, 且中文与缅甸语、越南语属于不同的语系, 语言差异较大, 导致当前的CLS方法泛化性较差。为此, 以缅-中、越-中为研究对象, 提出一种语言关系增强的CLS方法。首先将输入序列转化为连续词对; 然后计算源语言和目标语言之间的连续词对之间的关系; 最后引入MT和MS的联合训练方法, 有效地捕捉目标语言和源语言之间的关系, 提高模型的泛化性和对连续文本的处理能力。在自建数据集上进行实验, 结果表明, 相较其他基线模型, 该方法在ROUGE-1、ROUGE-2和ROUGE-L评价指标上分别提升了5、1、4百分点。

关键词: 跨语言摘要, 低资源语言, 语言差异, 连续文本, 泛化性

HE Zhilei, GAO Shengxiang, ZHU Enchang, YU Zhengtao. Research on Cross-Language Summarization in Chinese—Burmese—Vietnamese Based on Enhanced Linguistic Relationships[J]. Computer Engineering, 2025, 51(8): 160-167.

何志磊, 高盛祥, 朱恩昌, 余正涛. 基于强化语言关联的中缅越跨语言摘要研究[J]. 计算机工程, 2025, 51(8): 160-167.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069057

https://www.ecice06.com/EN/Y2025/V51/I8/160

Figures/Tables 14

Fig.1 Cross-language summary model structure of Chinese — Burmese — Vietnamese

Fig.2 Word pair calculation example

Fig.3 Model performance on the Burmese — Chinese dataset

Fig.4 Model performance on the Vietnamese — Chinese dataset

References 28

1	ZHU J N, WANG Q, WANG Y N, et al. NCLS: neural cross-lingual summarization[EB/OL]. [2023-09-05]. https://arxiv.org/abs/1909.00156.
2	ZHANG J J , ZHOU Y , ZONG C Q . Abstractive cross-language summarization via translation model enhanced predicate argument structure fusing. ACM Transactions on Audio, Speech, and Language Processing, 2016, 24 (10): 1842- 1853.
3	OUYANG J, SONG B Y, MCKEOWN K. A robust abstractive system for cross-lingual summarization[C]//Proceedings of the 2019 Conference of the North. [S. l. ]: ACL, 2019: 2025-2031.
4	SEE A, LIU P J, MANNING C D. Get to the point: summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. [S. l. ]: ACL, 2017: 1073-1083.
5	LEUSKI A , LIN C Y , ZHOU L , et al. Cross-lingual CSTRD. ACM Transactions on Asian Language Information Processing, 2003, 2 (3): 245- 269. doi: 10.1145/979872.979877
6	BOUDIN F , HUET S , TORRES-MORENO J M . A graph-based approach to cross-language multi-document summarization. Polibits, 2011, 43, 113- 118. doi: 10.17562/PB-43-16
7	PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the Web[EB/OL]. [2023-09-05]. https://api.semanticscholar.org/CorpusID:1508503.
8	WAN X J. Using bilingual information for cross-language document summarization[EB/OL]. [2023-09-05]. https://aclanthology.org/P11-1155.pdf.
9	ORAẶSAN C, CHIOREAN O A. Evaluation of a cross-lingual Romanian-English multi-document summariser[EB/OL]. [2023-09-05]. http://www.lrec-conf.org/proceedings/lrec2008/pdf/539_paper.pdf.
10	WAN X J, LI H Y, XIAO J G. Cross-language document summarization based on machine translation quality prediction[EB/OL]. [2023-09-05]. https://aclanthology.org/P10-1094.
11	CORTES C , VAPNIK V . Support-vector networks. Machine Learning, 1995, 20 (3): 273- 297. doi: 10.1023/A:1022627411411
12	ZHU J N, ZHOU Y, ZHANG J J, et al. Attend, translate and summarize: an efficient method for neural cross-lingual summarization[EB/OL]. [2023-09-05]. https://aclanthology.org/2020.acl-main.121.pdf.
13	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-09-05]. https://arxiv.org/abs/1706.03762.
14	DYER C, CHAHUNEAU V, SMITH N A. A simple, fast, and effective reparameterization of IBM model 2[EB/OL]. [2023-09-05]. https://aclanthology.org/N13-1073.pdf.
15	JIANG S Y, TU D B, CHEN X S, et al. ClueGraphSum: let key clues guide the cross-lingual abstractive summarization[EB/OL]. [2023-09-05]. https://arxiv.org/abs/2203.02797v2.
16	MIHALCEA R, TARAU P. TextRank: bringing order into text[EB/OL]. [2023-09-05]. https://aclanthology.org/W04-3252.
17	徐馨韬, 柴小丽, 谢彬, 等. 基于改进TextRank算法的中文文本摘要提取. 计算机工程, 2019, 45 (3): 273- 277. doi: 10.19678/j.issn.1000-3428.0051615
	XU X T , CHAI X L , XIE B , et al. Extraction of Chinese text summarization based on improved TextRank algorithm. Computer Engineering, 2019, 45 (3): 273- 277. doi: 10.19678/j.issn.1000-3428.0051615
18	STANTON S, IZMAILOV P, KIRICHENKO P, et al. Does knowledge distillation really work?[EB/OL]. [2023-09-05]. https://arxiv.org/abs/2106.05945.
19	AYANA , SHEN S Q , CHEN Y , et al. Zero-shot cross-lingual neural headline generation. ACM Transactions on Audio, Speech, and Language Processing, 2018, 26 (12): 2319- 2327.
20	CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. [2023-09-05]. https://arxiv.org/abs/1406.1078.
21	冯读娟, 杨璐, 严建峰. 基于双编码器结构的文本自动摘要研究. 计算机工程, 2020, 46 (6): 60- 64. doi: 10.19678/j.issn.1000-3428.0054540
	FENG D J , YANG L , YAN J F . Research on automatic text summarization based on dual-encoder structure. Computer Engineering, 2020, 46 (6): 60- 64. doi: 10.19678/j.issn.1000-3428.0054540
22	DUAN X Y, YIN M M, ZHANG M, et al. Zero-shot cross-lingual abstractive sentence summarization through teaching generation and attention[EB/OL]. [2023-09-05]. https://aclanthology.org/P19-1305.pdf.
23	NGUYEN T T, LUU A T. Improving neural cross-lingual abstractive summarization via employing optimal transport distance for knowledge distillation[EB/OL]. [2023-09-05]. https://ojs.aaai.org/index.php/AAAI/article/view/21359.
24	PHAM K, LE K, HO N, et al. On unbalanced optimal transport: an analysis of sinkhorn algorithm[EB/OL]. [2023-09-05]. https://arxiv.org/abs/2002.03293v2.
25	BAI Y, GAO Y, HUANG H Y. Cross-lingual abstractive summarization with limited parallel resources[EB/OL]. [2023-09-05]. https://aclanthology.org/2021.acl-long.538/.
26	LIANG Y L, MENG F D, ZHOU C L, et al. A variational hierarchical model for neural cross-lingual summarization[EB/OL]. [2023-09-05]. https://aclanthology.org/2022.acl-long.148/.
27	SOHN K , YAN X C , LEE H . Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems, 2015, 2015, 3483- 3491.
28	武氏河. 越南语与汉语的句法语序比较. 云南师范大学学报, 2005, 3 (6): 71- 74.
	Vu Thi Ha . A comparison between Vietnamese and Chinese syntactic constituent orders. Journal of Yunnan Normal University, 2005, 3 (6): 71- 74.

[1]	DONG Fengkai, ZOU Xiaoqiang, WANG Jiahui, MA Liming, YANG Wenyuan, LIU Xiyao. Dual-Stream Generalized Face Forgery Detection Method Based on Intra-Inter Frame Self-Blending [J]. Computer Engineering, 2024, 50(10): 185-195.
[2]	SUN Jia, ZHANG Jianhui, BU Youjun, CHEN Bo, HU Nan, WANG Fangyu. Log Anomaly Detection Method Based on CNN-BiLSTM Model [J]. Computer Engineering, 2022, 48(7): 151-158,167.
[3]	XU Xiaoyu, ZHAO Longzhang, CHENG Xiaoyue, HE Zhichao. Design of Convolutional Neural Network Based on Improved Fisher Discriminant Criterion [J]. Computer Engineering, 2020, 46(11): 255-260,266.
[4]	LIU Chongyang, LIU Qinrang. A Verification Method on Post-Pruning Generalization Ability of Neural Network Model [J]. Computer Engineering, 2019, 45(10): 234-238.
[5]	ZHAO Wen-liang, GUO Hua-ping, FAN Ming. Tri-Training Algorithm Based on Feature Transformation [J]. Computer Engineering, 2014, 40(5): 183-187,191.
[6]	SHU Yu, DIAO Qing, ZHOU Xin-Dong. Remote Sensing Water Depth Inversion Based on Chaotic Immune Optimization RBF Network [J]. Computer Engineering, 2013, 39(5): 187-191.

Please choose a citation manager

Content to export