知识感知的预训练语言模型综述

doi:10.19678/j.issn.1000-3428.0060823

摘要/Abstract

摘要： 随着自然语言处理（NLP）领域中预训练技术的快速发展，将外部知识引入到预训练语言模型的知识驱动方法在NLP任务中表现优异，知识表示学习和预训练技术为知识融合的预训练方法提供了理论依据。概述目前经典预训练方法的相关研究成果，分析在新兴预训练技术支持下具有代表性的知识感知的预训练语言模型，分别介绍引入不同外部知识的预训练语言模型，并结合相关实验数据评估知识感知的预训练语言模型在NLP各个下游任务中的性能表现。在此基础上，分析当前预训练语言模型发展过程中所面临的问题和挑战，并对领域发展前景进行展望。

关键词: 自然语言处理, 知识表征, 语义知识, 预训练, 语言模型

Abstract: In the field of Natural Language Processing(NLP), the recent years has witnessed a rapid development in pre-training technology, and the knowledge-driven method that injects external knowledge into a pre-trained language model performs excellently in NLP tasks.The techniques of knowledge representation learning and pre-training provide theoretical foundation for the knowledge-based pre-training method.This paper briefly introduces the development of the classical pre-trained methods.Then it analyzes the representative knowledge-aware pre-trained language models supported by new pre-training technology.According to the types of external knowledge, this paper introduces the pre-trained language models injected with different external knowledge.Based on relevant experimental data, it subsequently evaluates the performance of the knowledge-aware pre-trained language models in various downstream tasks of NLP.On this basis, the paper analyzes the problems and challenges faced by the developing pre-trained language models, and discusses the development trends of this field.

Key words: Natural Language Processing(NLP), knowledge representation, semantic knowledge, pre-training, language model

中图分类号:

TP391

李瑜泽, 栾馨, 柯尊旺, 李哲, 吾守尔·斯拉木. 知识感知的预训练语言模型综述[J]. 计算机工程, 2021, 47(9): 18-33.

LI Yuze, LUAN Xin, KE Zunwang, LI Zhe, Wushour Silamu. Survey of Knowledge-Aware Pre-Trained Language Models[J]. Computer Engineering, 2021, 47(9): 18-33.

https://www.ecice06.com/CN/Y2021/V47/I9/18

图/表 18

20210917184958

20210917185002

20210917185006

20210917185009

20210917185013

20210917185017

20210917185021

20210917185025

20210917185029

20210917185033

20210917185036

20210917185041

20210917185044

20210917185049

20210917185053

20210917185057

20210917185101

20210917185105

参考文献

[1] LIN Y, LIU Z, SUN M, et al.Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York, USA:AAAI Press, 2015:134-145.
[2] 段丹丹, 唐加山, 温勇, 等.基于BERT模型的中文短文本分类算法[J].计算机工程, 2021, 47(1):79-86. DUAN D D, TANG J S, WEN Y, et al.Chinese short text classification algorithm based on BERT model[J].Computer Engineering, 2021, 47(1):79-86.(in Chinese)
[3] SOON W M, NG H T, LIM D C Y.A machine learning approach to coreference resolution of noun phrases[J].Computational Linguistics, 2001, 27(4):521-544.
[4] RADFORD A, NARASIMHAN K, SALIMANS T, et al.Improving language understanding by generative pre-training[EB/OL].[2021-01-05].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[5] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2021-01-05].https://aclanthology.org/N19-1423.pdf.
[6] QIU X, SUN T, XU Y, et al.Pre-trained models for natural language processing:a survey[EB/OL].[2021-01-05].https://arxiv.org/pdf/2003.08271v2.pdf.
[7] HOWARD J, RUDER S.Universal language model fine-tuning for text classification[EB/OL].[2021-01-05].https://aclanthology.org/P18-1031.pdf.
[8] ROSSET C, XIONG C, PHAN M, et al.Knowledge-aware language model pretraining[EB/OL].[2021-01-05].https://openreview.net/pdf?id=OAdGsaptOXy.
[9] LIU H C, YOU J X, LI Z W, et al.Fuzzy petrinets for knowledge representation and reasoning:a literature review[J].Engineering Applications of Artificial Intelligence, 2017(60):45-56.
[10] GUO S, WANG Q, WANG B, et al.SSE:semantically smooth embedding for knowledge graphs[J].IEEE Transactions on Knowledge and Data Engineering, 2017(29):884-897.
[11] BORDES A, USUNIER N, GARCIA-DURAN A, et al.Translating embeddings for modeling multi-relational data[EB/OL].[2021-01-05].https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf.
[12] DING Y X, WU R, ZHANG X.Ontology-based knowledge representation for malware individuals and families[J].Computers & Security, 2019(87):101574.
[13] 刘知远, 孙茂松, 林衍凯, 等.知识表示学习研究进展[J].计算机研究与发展, 2016, 53(2):247-261. LIU Z Y, SUN M S, LIN Y K, et al.Knowledge representation learning:a review[J].Journal of Computer Research and Development, 2016, 53(2):247-261.(in Chinese)
[14] 卢晨阳, 康雁, 杨成荣, 等.基于语义结构的迁移学习文本特征对齐算法[J].计算机工程, 2019, 45(5):116-121. LU C Y, KANG Y, YANG C R, et al.Text feature alignment algorithm for transfer learning based on semantic structure[J].Computer Engineering, 2019, 45(5):116-121.(in Chinese)
[15] WAN J, HUANG X.KaLM at SemEval-2020 task 4:knowledge-aware language models for comprehension and generation[EB/OL].[2021-01-05].https://arxiv.org/pdf/2005.11768v2.pdf.
[16] SHI Y, ZHANG W Q, CAI M, et al.Efficient one-pass decoding with NNLM for speech recognition[J].IEEE Signal Processing Letters, 2014, 21(4):377-381.
[17] GOLDBERG Y, LEVY O.word2vec explained:deriving Mikolov et al.'s negative-sampling word-embedding method[EB/OL].[2021-01-05].https://arxiv.org/pdf/1402.3722.pdf.
[18] PENNINGTON J, SOCHER R, MANNING C D.Glove:global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing.Washington D.C., USA:IEEE Press, 2014:1532-1543.
[19] JOULIN A, GRAVE E, BOJANOWSKI P, et al.Bag of tricks for efficient text classification[EB/OL].[2021-01-05].https://pdfs.semanticscholar.org/892e/53fe5cd39f037cb2a961499f42f3002595dd.pdf.
[20] TAI K S, SOCHER R, MANNING C D.Improved semantic representations from tree-structured long short-term memory networks[EB/OL].[2021-01-05].https://aclanthology.org/P15-1150.pdf.
[21] PETERS M E, NEUMANN M, IYYER M, et al.Deep contextualized word representations[EB/OL].[2021-01-05].https://aclanthology.org/N18-1202.pdf.
[22] YANG Z, DAI Z, YANG Y, et al.XLNet:generalized autoregressive pretraining for language understanding[EB/OL].[2021-01-05].https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf.
[23] KENTER T, BORISOV A, DE RIJKE M.Siamese CBOW:optimizing word embeddings for sentence representations[EB/OL].[2021-01-05].https://aclanthology.org/P16-1089.pdf.
[24] GUTHRIE D, ALLISON B, LIU W, et al.A closer look at skip-gram modeling[EB/OL].[2021-01-05].http://www.lrec-conf.org/proceedings/lrec2006/pdf/357_pdf.pdf.
[25] PENG H, LI J, SONG Y, et al.Incrementally learning the hierarchical softmax function for neural language models[EB/OL].[2021-01-05].http://home.cse.ust.hk/~yqsong/papers/2017-AAAI-Incremental.pdf.
[26] MCCANN B, BRADBURY J, XIONG C, et al.Learned in translation:contextualized word vectors[EB/OL].[2021-01-05].https://papers.nips.cc/paper/2017/file/20c86a628232a67e7bd46f76fba7ce12-Paper.pdf.
[27] RESNIK P.Semantic similarity in a taxonomy:an information-based measure and its application to problems of ambiguity in natural language[J].Journal of Artificial Intelligence Research, 1999, 11:95-130.
[28] BOJANOWSKI P, GRAVE E, JOULIN A, et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics, 2017, 5:135-146.
[29] LEVY O, GOLDBERG Y.Neural word embedding as implicit matrix factorization[J].Advances in Neural Information Processing Systems, 2014, 27:2177-2185.
[30] GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al.LSTM:a search space odyssey[J].IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(10):2222-2232.
[31] WU X, ZHANG T, ZANG L, et al."Mask and infill":applying masked language model to sentiment transfer[EB/OL].[2021-01-05].https://arxiv.org/pdf/1908.08039.pdf.
[32] GHAZVININEJAD M, LEVY O, LIU Y, et al.Mask-predict:parallel decoding of conditional masked language models[EB/OL].[2021-01-05].https://aclanthology.org/D19-1633.pdf.
[33] SONG K, TAN X, QIN T, et al.MASS:masked sequence to sequence pre-training for language generation[EB/OL].[2021-01-05].https://www.microsoft.com/en-us/research/uploads/prod/2019/06/MASS-paper-updated-002.pdf.
[34] DONG L, YANG N, WANG W, et al.Unified language model pre-training for natural language understanding and generation[EB/OL].[2021-01-05].https://papers.nips.cc/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf.
[35] CHENG Y, FU S, TANG M, et al.Multi-task deep neural network enabled optical performance monitoring from directly detected PDM-QAM signals[J].Optics Express, 2019, 27(13):19062-19074.
[36] SUN Y, WANG S, LI Y, et al.ERNIE 2.0:a continual pre-training framework for language understanding[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York, USA:AAAI Press, 2020:8968-8975.
[37] ZHANG Z, HAN X, LIU Z, et al.ERNIE:enhanced language representation with informative entities[EB/OL].[2021-01-05].https://aclanthology.org/P19-1139.pdf.
[38] CONNEAU A, KIELA D, SCHWENK H, et al.Supervised learning of universal sentence representations from natural language inference data[EB/OL].[2021-01-05].https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf.
[39] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al.Neural architectures for named entity recognition[EB/OL].[2021-01-05].https://aclanthology.org/N16-1030.pdf.
[40] LIU Y, OTT M, GOYAL N, et al.RoBERT:a robustly optimized bert pretraining approach[EB/OL].[2021-01-05].https://export.arxiv.org/pdf/1907.11692.
[41] CUI Y, CHE W, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2021-01-05].https://arxiv.org/pdf/1906.08101v2.pdf.
[42] LAN Z, CHEN M, GOODMAN S, et al.ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2021-01-05].https://openreview.net/pdf?id=H1eA7AEtvS.
[43] JOSHI M, CHEN D, LIU Y, et al.SpanBERT:improving pre-training by representing and predicting spans[EB/OL].[2021-01-05].https://www.cs.princeton.edu/~danqic/papers/tacl2020.pdf.
[44] JIAO X, YIN Y, SHANG L, et al.TinyBERT:distilling BERT for natural language understanding[EB/OL].[2021-01-05].https://aclanthology.org/2020.findings-emnlp.372.pdf.
[45] HINTON G, VINYALS O, DEAN J.Distilling the knowledge in a neural network[J].Computer Science, 2015, 14(7):38-39.
[46] AGARWAL O, GE H, SHAKERI S, et al.Large scale knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training[EB/OL].[2021-01-05].https://aclanthology.org/2021.naacl-main.278.pdf.
[47] YAO L, MAO C, LUO Y.KG-BERT:BERT for knowledge graph completion[EB/OL].[2021-01-05].https://arxiv.org/pdf/1909.03193.pdf.
[48] PETERS M E, NEUMANN M, LOGAN IV R L, et al.Knowledge enhanced contextual word representations[EB/OL].[2021-01-05].https://arxiv.org/pdf/1909. 04164.pdf.
[49] LIU H, SINGH P.ConceptNet-a practical commonsense reasoning tool-kit[J].BT Technology Journal, 2004, 22(4):211-226.
[50] WANG X, GAO T, ZHU Z, et al.KEPLER:a unified model for knowledge embedding and pre-trained language representation[EB/OL].[2021-01-05].https://bakser.github.io/files/TACL-KEPLER/KEPLER.pdf.
[51] LIU W, ZHOU P, ZHAO Z, et al.K-BERT:enabling language representation with knowledge graph[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York, USA:AAAI Press, 2020:2901-2908.
[52] WANG R, TANG D, DUAN N, et al.K-adapter:infusing knowledge into pre-trained models with adapters[EB/OL].[2021-01-05].https://arxiv.org/pdf/2002.01808v3.pdf.
[53] ZHOU H, YOUNG T, HUANG M, et al.Commonsense knowledge aware conversation generation with graph attention[EB/OL].[2021-01-05].http://coai.cs.tsinghua.edu.cn/hml/media/files/2018_commonsense_ZhouHao_3_TYVQ7Iq.pdf.
[54] WU S, LI Y, ZHANG D, et al.Diverse and informative dialogue generation with context-specific commonsense knowledge awareness[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.San Diego, USA:Association for Computational Linguistics, 2020:5811-5820.
[55] ZHOU H, YOUNG T, HUANG M, et al.Commonsense knowledge aware conversation generation with graph attention[EB/OL].[2021-01-05].http://coai.cs.tsinghua.edu.cn/hml/media/files/2018_commonsense_ZhouHao_3_TYVQ7Iq.pdf.
[56] HAYKIN S, KOSKO B.Gradientbased learning applied to document recognition[EB/OL].[2021-01-05].https://axon.cs.byu.edu/~martinez/classes/678/Papers/Convolution_nets.pdf.
[57] WEI J, REN X, LI X, et al.NEZHA:neural contextualized representation for Chinese language understanding[EB/OL].[2021-01-05].https://lonepatient.top/2020/01/20/NEZHA.
[58] SHAOUL C, BAAYEN R H, WESTBURY C F.N-gram probability effects in a cloze task[J].The Mental Lexicon, 2014, 9(3):437-472.
[59] DIAO S, BAI J, SONG Y, et al.ZEN:pre-training Chinese text encoder enhanced by N-gram representations[EB/OL].[2021-01-05].https://aclanthology.org/2020.findings-emnlp.425.pdf.
[60] CUI Y, CHE W, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2021-01-05].https://arxiv.org/pdf/1906.08101v2.pdf.
[61] LEE J, YOON W, KIM S, et al.BioBERT:a pre-trained biomedical language representation model for biomedical text mining[J].Bioinformatics, 2020, 36(4):1234-1240.
[62] LEE J S, HSIANG J.Patent classification by fine-tuning BERT language model[EB/OL].[2021-01-05].https://arxiv.org/ftp/arxiv/papers/1906/1906.02124.pdf.
[63] BELTAGY I, LO K, COHAN A.SciBERT:a pretrained language model for scientific text[EB/OL].[2021-01-05].https://aclanthology.org/D19-1371.pdf.
[64] ZHAO S, GUPTA R, SONG Y, et al.Extreme language model compression with optimal subwords and shared projections[EB/OL].[2021-01-05].https://openreview.net/pdf?id=S1x6ueSKPr.

选择文件类型/文献管理软件名称

选择包含的内容