ALICE:一种面向中文科技文本分析的预训练语言表征模型

doi:10.19678/j.issn.1000-3428.0055246

摘要/Abstract

摘要： 深度学习模型应用于自然语言处理任务时依赖大型、高质量的人工标注数据集。为降低深度学习模型对大型数据集的依赖，提出一种基于BERT的中文科技自然语言处理预训练模型ALICE。通过对遮罩语言模型进行改进并将其与命名实体级遮罩相结合，改善基础模型在下游任务中的表现，使其学习到的语言表征更贴合中文的语言特性。实验结果表明，与BERT模型相比，ALICE模型对于中文科技文本的分类准确率和命名实体识别的F1值分别提高1.2%和0.8%。

关键词: 预训练模型, 迁移学习, BERT模型, 文本分类, 命名实体识别, 自然语言推断

Abstract: The deep model of natural language processing rely on huge,high-quality and human-annotated dataset.In order to alleviate such dependency,this paper proposes a BERT-based natural language processing pre-trained model for Chinese technological text named ALICE.Improve Masked Language Model(MLM) and combine it with entity-level mask to boost the base model’s performance on downstream tasks,and let the learned representations fit Chinese trait much better.Experimental results show that,compared with the BERT model,ALICE model improves the classification accuracy of Chinese technological texts and the F1 value of named entity recognition by 1.2% and 0.8%,respectively.

Key words: pre-trained model, transfer learning, BERT model, text classification, named entity recognition, natural language inference

中图分类号:

TP391

王英杰, 谢彬, 李宁波. ALICE:一种面向中文科技文本分析的预训练语言表征模型[J]. 计算机工程, 2020, 46(2): 48-52,58.

WANG Yingjie, XIE Bin, LI Ningbo. ALICE:A Pre-trained Language Representation Model for Chinese Technological Text Analysis[J]. Computer Engineering, 2020, 46(2): 48-52,58.

https://www.ecice06.com/CN/Y2020/V46/I2/48

图/表 8

20200218140926

20200218141022

20200218141025

20200218141028

20200218141031

20200218141035

20200218141038

20200218141041

参考文献

[1] XI Xuefeng,ZHOU Guodong.A survey on deep learning for natural language processing[J].Acta Automatica Sinica,2016,42(10):1445-1465.(in Chinese) 奚雪峰,周国栋.面向自然语言处理的深度学习研究[J].自动化学报,2016,42(10):1445-1465.
[2] LI Liuyu,DU Qiu.The ability and inability of deep learning[J].Internation Financing,2018(8):25-27.(in Chinese) 李留宇,杜秋.深度学习的能与不能[J].国际融资,2018(8):25-27.
[3] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2019-06-01]. https://arxiv.org/pdf/1810.04805.pdf.
[4] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[EB/OL].[2019-06-01]. https://arxiv.org/pdf/1802.05365.pdf.
[5] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].[2019-06-01]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[6] ZHANG Yukun,LIU Maofu,HU Huijun.Chinese medical entity classification and relationship extraction based on joint neural network model[J].Computer Engineering and Science,2019,41(6):1110-1118.(in Chinese) 张玉坤,刘茂福,胡慧君.基于联合神经网络模型的中文医疗实体分类与关系抽取[J].计算机工程与科学,2019,41(6):1110-1118.
[7] LI Jianlong,WANG Panqing,HAN Qiyu.Military named entity recognition based on bidirectional LSTM[J].Computer Engineering and Science,2019,41(4):713-718.(in Chinese) 李健龙,王盼卿,韩琪羽.基于双向LSTM的军事命名实体识别[J].计算机工程与科学,2019,41(4):713-718.
[8] BROWN P F.Class-based n-gram models of natural language[J].Computational Linguistics,1992,18(4):467-479.
[9] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2013:3111-3119.
[10] PENNINGTON J,SOCHER R,MANNING C.GloVe:global vectors for wordrepresentation[EB/OL].[2019-06-01]. https://nlp.stanford.edu/pubs/glove.pdf.
[11] TURIAN J P,RATINOV L A,BENGIO Y.Word representations:a simple and general method for semi-supervised learning[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2010:384-394.
[12] MCCANN B,BRADBURY J,XIONG C,et al.Learned in translation:contextualized word vectors[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2017:6294-6305.
[13] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2017:5998-6008.
[14] SUN Yu,WANG Shuohuan,LI Yukun,et al.ERNIE:enhanced representation through knowledge integration[EB/OL].[2019-06-01]. https://arxiv.org/pdf/1904.09223v1.pdf.
[15] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing (almost) from scratch[J].Journal of Machine Learning Research,2011,12(8):2493-2537.
[16] KIROS R,ZHU Y,SALAKHUTDINOV R R,et al.Skip-thought vectors[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge,USA:MIT Press,2015:3294-3302.
[17] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[18] HOWARD J,RUDER S.Universal language model fine-tuning for text classification[EB/OL].[2019-06-01]. https://arxiv.org/pdf/1801.06146.pdf.
[19] ZHU H,PASCHALIDIS I C,TAHMASEBI A.Clinical concept extraction with contextual word embedding[EB/OL].[2019-06-01]. https://arxiv.org/pdf/1810.10 566.pdf.
[20] LEE J,YOON W,KIM S,et al.Biobert:pre-trained biomedical language representation model for biomedical text mining[EB/OL].[2019-06-01].https://arxiv.org/ftp/arxiv/papers/1901/1901.08746.pdf.
[21] SUN Xu,WANG Houfeng,LI Wenjie.Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,USA:Association for Computational Linguistics,2012:253-262.
[22] ADHIKARI A,RAM A,TANG R,et al.DocBERT:BERT for document classification[EB/OL].[2019-06-01].https://arxiv.org/pdf/1904.08398v1.pdf.
[23] CONNEAU A,LAMPLE G,RINOTT R,et al.XNLI:evaluating cross-lingual sentence representations[EB/OL].[2019-06-01].https://arxiv.org/pdf/1809.05053.pdf.
[24] KOWSARI K,HEIDARYSAFA M,BROWN D E,et al.RMDL:random multimodel deep learning for classification[C]//Proceedings of the 2nd International Conference on Information System and Data Mining.New York,USA:ACM Press,2018:19-28.
[25] ZHANG Yue,YANG Jie.Chinese NER using lattice LSTM[EB/OL].[2019-06-01].https://arxiv.org/pdf/1805.02023.pdf.
[26] ARTETXE M,SCHWENK H.Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond[EB/OL].[2019-06-01].https://arxiv.org/pdf/1812.10464.pdf.

选择文件类型/文献管理软件名称

选择包含的内容