基于MEGA网络和分层预测的标点恢复方法

doi:10.19678/j.issn.1000-3428.0068599

摘要/Abstract

摘要： 标点恢复又称标点预测，是指对一段没有标点的文本添加合适的标点，以提高文本的可读性，是一项经典的自然语言处理任务。近年来，随着预训练模型的发展和标点恢复研究的深入，标点恢复任务的性能在不断提升。然而，基于Transformer结构的预训练模型在提取长序列输入的局部信息方面存在局限性，不利于最终标点符号的预测。此外，以往的研究将标点标签视为要预测的符号，忽略了不同标点的场景属性和标点间的关系。为了解决这些问题，该文引入了移动平均门控注意力(Moving average Equipped Gated Attention ,MEGA)网络作为辅助模块，以增强模型对局部信息的提取能力。同时，该文还构建了分层预测模块，充分利用不同标点符号的场景属性和标点间的关系进行最终的分类。该文使用多种基于Transformer结构的预训练模型在不同语言的数据集上进行实验，在英文标点数据集IWSLT的实验结果表明，多数预训练模型上应用MEGA模块和分层预测模块都能获得性能增益，值得注意的是使用DeBERTaV3 xlarge在IWSLT的REF测试集取得了85.5%的F1分数，相比于基线提升了1.2%，这是目前在REF测试集中的最佳结果。此外，该文的模型在中文标点数据集的实验中也取得了目前为止最高的精度。

Abstract: Punctuation restoration, also known as punctuation prediction, refers to the task of adding appropriate punctuation marks to a text without punctuations to enhance its readability. It is a classic natural language processing task. In recent years, with the development of pre-training models and deepening research on punctuation restoration, the performance of punctuation restoration tasks has been continuously improving. However, Transformer-based pre-training models have limitations in extracting local information from long sequence inputs, which hinders the prediction of final punctuation marks. Additionally, previous studies treated punctuation labels as symbols to be predicted, overlooking the contextual attributes of different punctuation marks and their relationships. To address these issues, this paper introduces the Moving average Equipped Gated Attention (MEGA) network as an auxiliary module to enhance the model's ability to extract local information. Moreover, a hierarchical prediction module is constructed to fully utilize the contextual attributes of different punctuation marks and the relationships between them for final classification. The paper conducts experiments using various Transformer-based pre-training models on datasets in different languages. The experimental results on the English punctuation dataset IWSLT demonstrate that applying the MEGA module and hierarchical prediction module on most pre-training models leads to performance gains. Notably, using DeBERTaV3 xlarge achieves an F1 score of 85.5% on the REF test set of IWSLT, which is a 1.2% improvement compared to the baseline, making it the best result on the REF test set to date. The proposed model also achieves the highest accuracy on the Chinese punctuation dataset.

张文博, 黄浩, 吴迪, 唐敏杰. 基于MEGA网络和分层预测的标点恢复方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0068599.

ZHANG Wenbo, HUANG Hao, WU Di, TANG Minjie. Punctuation Restoration Method Based on MEGA Network and Hierarchical Prediction[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0068599.

参考文献

[1] Păiş V, Tufiş D. Capitalization and punctuation restoration: a survey[J]. Artificial Intelligence Review, 2021: 1-42.
[2] Vandeghinste V, Guhr O. FullStop: Punctuation and Segmentation Prediction for Dutch with Transformers[J]. arXiv preprint arXiv:2301.03319, 2023.
[3] Vandeghinste V, Verwimp L, Pelemans J, et al. A comparison of different punctuation prediction approaches in a translation context[C]//Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain. European Association for Machine Translation, 2018: 269-278.
[4] Zhou C, Li Q, Li C, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt[J]. arXiv preprint arXiv:2302.09419, 2023.
[5] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processingsystems, 2017, 30.
[6] Shi N, Wang W, Wang B, et al. Incorporating External POS Tagger for Punctuation Restoration[J]. Proc. Interspeech 2021, 2021: 1987-1991.
[7] Dai Z, Lai G, Yang Y, et al. Funnel-transformer: Filtering out sequential redundancy for efficient language processing[J]. Advances in neural information processing systems, 2020, 33: 4271-4282.
[8] Yi, J., Tao, J., Tian, Z., Bai, Y., Fan, C. (2020) Focal Loss for Punctuation Prediction. Proc. Interspeech 2020, 721-725, doi: 10.21437/Interspeech.2020-1638
[9] 刘鹏远,王伟康,邱立坤,杜冰洁.CDCPP:跨领域中文标点符号预测[J].中文信息学报,2021,35(06):131-140. LIU Pengyuan, WANG Weikang, QIU Likun, DU Bingjie. CDCPP: Cross-Domain Chinese Punctuation Prediction. Journal of Chinese Information Processing. 2021, 35(6): 131-140
[10] 陈玉娜,史晓东.通过标点恢复提高机器同传效果[J].计算机应用,2020,40(04):972-977. CHEN Yuna, SHI Xiaodong. Improving machine simultaneous interpretation by punctuation recovery[J]. Journal of Computer Applications, 2020, 40(4): 972-977.
[11] Courtland M, Faulkner A, McElvain G. Efficient automatic punctuation restoration using bidirectional transformers with robust inference[C]//Proceedings of the 17th International Conference on Spoken Language Translation. 2020: 272-279.
[12] Alam T, Khan A, Alam F. Punctuation restoration using transformer models for high-and low-resource languages[C]//Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). 2020: 132-142.
[13] Kenton J D M W C, Toutanova L K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT. 2019: 4171-4186.
[14] Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
[15] Lin T, Wang Y, Liu X, et al. A survey of transformers[J]. AI Open, 2022.
[16] Ma X, Zhou C, Kong X, et al. Mega: Moving Average Equipped Gated Attention[C]//The Eleventh International Conference on Learning Representations. 2022.
[17] Stolcke A, Shriberg E. Automatic linguistic segmentation of conversational speech[C]//Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP'96. IEEE, 1996, 2: 1005-1008.
[18] Lu W, Ng H T. Better punctuation prediction with dynamic conditional random fields[C]//Proceedings of the 2010 conference on empirical methods in natural language processing. 2010: 177-186.
[19] Sutton C, Rohanimanesh K, McCallum A. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data[C]//Proceedings of the twenty-first international conference on Machine learning. 2004: 99.
[20] Che X, Wang C, Yang H, et al. Punctuation prediction for unsegmented transcript based on word vector[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 2016: 654-658.
[21] Cho E, Kilgour K, Niehues J, et al. Combination of nn and crf models for joint detection of punctuation and disfluencies[C]//Sixteenth annual conference of the international speech communication association. 2015.
[22] Tilk O, Alumäe T. Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration[C]//Interspeech. 2016, 3: 9.
[23] 段大高,梁少虎,赵振东,韩忠明.基于自注意力机制的中文标点符号预测模型 [J]. 计算机工程,2020,46(05):291-297. DUAN Dagao, LIANG Shaohu, ZHAO Zhendong, HAN Zhongming. Prediction Model of Chinese Punctuation Based on Self-Attention Mechanism[J]. Computer Engineering, 2020, 46(5): 291-297.
[24] 李雅昆,潘晴,Everett X.WANG.基于改进的多层 BLSTM 的中文分词和标点预测 [J]. 计算机应用,2018,38(05):1278-1282. LI Yakun, PAN Qing, WANG Feng. Joint Chinese word segmentation and punctuation prediction based on improved multilayer BLSTM network[J]. Journal of Computer Applications, 2018, 38(5): 1278-1282.
[25] Zhang Z, Liu J, Chi L, et al. Word-level bert-cnn-rnn model for chinese punctuation restoration[C]//2020 IEEE 6th International Conference on Computer and Communications (ICCC). IEEE, 2020: 1629-1633.
[26] Zhu Y, Wu L, Cheng S, et al. Unified multimodal punctuation restoration framework for mixed-modality corpus[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 7272-7276.
[27] Cho, Y., Ng, S., Tran, T., Ostendorf, M. (2022) Leveraging Prosody for Punctuation Prediction of Spontaneous Speech. Proc. Interspeech 2022, 555-559.
[28] Pappagari R, Żelasko P, Mikołajczyk A, et al. Joint prediction of truecasing and punctuation for conversationalspeech in low-resource scenarios[C]//2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2021: 1185-1191.
[29] Wang W, Liu Y, Jiang W, et al. Making punctuation restoration robust with disfluency detection[C]//2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 2022: 395-399.
[30] Lin B, Wang L. Joint Prediction of Punctuation and Disfluency in Speech Transcripts[C]//Interspeech. 2020: 716-720.
[31] Chen Q, Chen M, Li B, et al. Controllable time-delay transformer for real-time punctuation prediction and disfluency detection[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020: 8069-8073.
[32] Chen, Q., Wang, W., Chen, M., Zhang, Q. (2021) Discriminative Self-Training for Punctuation Prediction. Proc. Interspeech 2021, 771-775
[33] Hunter J S. The exponentially weighted moving average[J]. Journal of quality technology, 1986, 18(4): 203-210.
[34] Elfwing S, Uchibe E, Doya K. Sigmoid-weighted linear units for neural network function approximation in reinforc
ement learning[J]. Neural networks, 2018, 107: 3-11. [35] Federico M, Cettolo M, Bentivogli L, et al. Overview of the IWSLT 2012 evaluation campaign[C]//Proceedings of the international workshop on spoken language translation (IWSLT). 2012: 12-33.
[36] D. P . Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2015.
[37] He P, Gao J, Chen W. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing[C]//The Eleventh International Conference on Learning Representations. 2022.

选择文件类型/文献管理软件名称

选择包含的内容