[1] LIN Y, LIU Z, SUN M, et al.Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York, USA:AAAI Press, 2015:134-145. [2] 段丹丹, 唐加山, 温勇, 等.基于BERT模型的中文短文本分类算法[J].计算机工程, 2021, 47(1):79-86. DUAN D D, TANG J S, WEN Y, et al.Chinese short text classification algorithm based on BERT model[J].Computer Engineering, 2021, 47(1):79-86.(in Chinese) [3] SOON W M, NG H T, LIM D C Y.A machine learning approach to coreference resolution of noun phrases[J].Computational Linguistics, 2001, 27(4):521-544. [4] RADFORD A, NARASIMHAN K, SALIMANS T, et al.Improving language understanding by generative pre-training[EB/OL].[2021-01-05].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf. [5] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2021-01-05].https://aclanthology.org/N19-1423.pdf. [6] QIU X, SUN T, XU Y, et al.Pre-trained models for natural language processing:a survey[EB/OL].[2021-01-05].https://arxiv.org/pdf/2003.08271v2.pdf. [7] HOWARD J, RUDER S.Universal language model fine-tuning for text classification[EB/OL].[2021-01-05].https://aclanthology.org/P18-1031.pdf. [8] ROSSET C, XIONG C, PHAN M, et al.Knowledge-aware language model pretraining[EB/OL].[2021-01-05].https://openreview.net/pdf?id=OAdGsaptOXy. [9] LIU H C, YOU J X, LI Z W, et al.Fuzzy petrinets for knowledge representation and reasoning:a literature review[J].Engineering Applications of Artificial Intelligence, 2017(60):45-56. [10] GUO S, WANG Q, WANG B, et al.SSE:semantically smooth embedding for knowledge graphs[J].IEEE Transactions on Knowledge and Data Engineering, 2017(29):884-897. [11] BORDES A, USUNIER N, GARCIA-DURAN A, et al.Translating embeddings for modeling multi-relational data[EB/OL].[2021-01-05].https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf. [12] DING Y X, WU R, ZHANG X.Ontology-based knowledge representation for malware individuals and families[J].Computers & Security, 2019(87):101574. [13] 刘知远, 孙茂松, 林衍凯, 等.知识表示学习研究进展[J].计算机研究与发展, 2016, 53(2):247-261. LIU Z Y, SUN M S, LIN Y K, et al.Knowledge representation learning:a review[J].Journal of Computer Research and Development, 2016, 53(2):247-261.(in Chinese) [14] 卢晨阳, 康雁, 杨成荣, 等.基于语义结构的迁移学习文本特征对齐算法[J].计算机工程, 2019, 45(5):116-121. LU C Y, KANG Y, YANG C R, et al.Text feature alignment algorithm for transfer learning based on semantic structure[J].Computer Engineering, 2019, 45(5):116-121.(in Chinese) [15] WAN J, HUANG X.KaLM at SemEval-2020 task 4:knowledge-aware language models for comprehension and generation[EB/OL].[2021-01-05].https://arxiv.org/pdf/2005.11768v2.pdf. [16] SHI Y, ZHANG W Q, CAI M, et al.Efficient one-pass decoding with NNLM for speech recognition[J].IEEE Signal Processing Letters, 2014, 21(4):377-381. [17] GOLDBERG Y, LEVY O.word2vec explained:deriving Mikolov et al.'s negative-sampling word-embedding method[EB/OL].[2021-01-05].https://arxiv.org/pdf/1402.3722.pdf. [18] PENNINGTON J, SOCHER R, MANNING C D.Glove:global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing.Washington D.C., USA:IEEE Press, 2014:1532-1543. [19] JOULIN A, GRAVE E, BOJANOWSKI P, et al.Bag of tricks for efficient text classification[EB/OL].[2021-01-05].https://pdfs.semanticscholar.org/892e/53fe5cd39f037cb2a961499f42f3002595dd.pdf. [20] TAI K S, SOCHER R, MANNING C D.Improved semantic representations from tree-structured long short-term memory networks[EB/OL].[2021-01-05].https://aclanthology.org/P15-1150.pdf. [21] PETERS M E, NEUMANN M, IYYER M, et al.Deep contextualized word representations[EB/OL].[2021-01-05].https://aclanthology.org/N18-1202.pdf. [22] YANG Z, DAI Z, YANG Y, et al.XLNet:generalized autoregressive pretraining for language understanding[EB/OL].[2021-01-05].https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf. [23] KENTER T, BORISOV A, DE RIJKE M.Siamese CBOW:optimizing word embeddings for sentence representations[EB/OL].[2021-01-05].https://aclanthology.org/P16-1089.pdf. [24] GUTHRIE D, ALLISON B, LIU W, et al.A closer look at skip-gram modeling[EB/OL].[2021-01-05].http://www.lrec-conf.org/proceedings/lrec2006/pdf/357_pdf.pdf. [25] PENG H, LI J, SONG Y, et al.Incrementally learning the hierarchical softmax function for neural language models[EB/OL].[2021-01-05].http://home.cse.ust.hk/~yqsong/papers/2017-AAAI-Incremental.pdf. [26] MCCANN B, BRADBURY J, XIONG C, et al.Learned in translation:contextualized word vectors[EB/OL].[2021-01-05].https://papers.nips.cc/paper/2017/file/20c86a628232a67e7bd46f76fba7ce12-Paper.pdf. [27] RESNIK P.Semantic similarity in a taxonomy:an information-based measure and its application to problems of ambiguity in natural language[J].Journal of Artificial Intelligence Research, 1999, 11:95-130. [28] BOJANOWSKI P, GRAVE E, JOULIN A, et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics, 2017, 5:135-146. [29] LEVY O, GOLDBERG Y.Neural word embedding as implicit matrix factorization[J].Advances in Neural Information Processing Systems, 2014, 27:2177-2185. [30] GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al.LSTM:a search space odyssey[J].IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(10):2222-2232. [31] WU X, ZHANG T, ZANG L, et al."Mask and infill":applying masked language model to sentiment transfer[EB/OL].[2021-01-05].https://arxiv.org/pdf/1908.08039.pdf. [32] GHAZVININEJAD M, LEVY O, LIU Y, et al.Mask-predict:parallel decoding of conditional masked language models[EB/OL].[2021-01-05].https://aclanthology.org/D19-1633.pdf. [33] SONG K, TAN X, QIN T, et al.MASS:masked sequence to sequence pre-training for language generation[EB/OL].[2021-01-05].https://www.microsoft.com/en-us/research/uploads/prod/2019/06/MASS-paper-updated-002.pdf. [34] DONG L, YANG N, WANG W, et al.Unified language model pre-training for natural language understanding and generation[EB/OL].[2021-01-05].https://papers.nips.cc/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf. [35] CHENG Y, FU S, TANG M, et al.Multi-task deep neural network enabled optical performance monitoring from directly detected PDM-QAM signals[J].Optics Express, 2019, 27(13):19062-19074. [36] SUN Y, WANG S, LI Y, et al.ERNIE 2.0:a continual pre-training framework for language understanding[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York, USA:AAAI Press, 2020:8968-8975. [37] ZHANG Z, HAN X, LIU Z, et al.ERNIE:enhanced language representation with informative entities[EB/OL].[2021-01-05].https://aclanthology.org/P19-1139.pdf. [38] CONNEAU A, KIELA D, SCHWENK H, et al.Supervised learning of universal sentence representations from natural language inference data[EB/OL].[2021-01-05].https://research.fb.com/wp-content/uploads/2017/09/emnlp2017.pdf. [39] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al.Neural architectures for named entity recognition[EB/OL].[2021-01-05].https://aclanthology.org/N16-1030.pdf. [40] LIU Y, OTT M, GOYAL N, et al.RoBERT:a robustly optimized bert pretraining approach[EB/OL].[2021-01-05].https://export.arxiv.org/pdf/1907.11692. [41] CUI Y, CHE W, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2021-01-05].https://arxiv.org/pdf/1906.08101v2.pdf. [42] LAN Z, CHEN M, GOODMAN S, et al.ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2021-01-05].https://openreview.net/pdf?id=H1eA7AEtvS. [43] JOSHI M, CHEN D, LIU Y, et al.SpanBERT:improving pre-training by representing and predicting spans[EB/OL].[2021-01-05].https://www.cs.princeton.edu/~danqic/papers/tacl2020.pdf. [44] JIAO X, YIN Y, SHANG L, et al.TinyBERT:distilling BERT for natural language understanding[EB/OL].[2021-01-05].https://aclanthology.org/2020.findings-emnlp.372.pdf. [45] HINTON G, VINYALS O, DEAN J.Distilling the knowledge in a neural network[J].Computer Science, 2015, 14(7):38-39. [46] AGARWAL O, GE H, SHAKERI S, et al.Large scale knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training[EB/OL].[2021-01-05].https://aclanthology.org/2021.naacl-main.278.pdf. [47] YAO L, MAO C, LUO Y.KG-BERT:BERT for knowledge graph completion[EB/OL].[2021-01-05].https://arxiv.org/pdf/1909.03193.pdf. [48] PETERS M E, NEUMANN M, LOGAN IV R L, et al.Knowledge enhanced contextual word representations[EB/OL].[2021-01-05].https://arxiv.org/pdf/1909. 04164.pdf. [49] LIU H, SINGH P.ConceptNet-a practical commonsense reasoning tool-kit[J].BT Technology Journal, 2004, 22(4):211-226. [50] WANG X, GAO T, ZHU Z, et al.KEPLER:a unified model for knowledge embedding and pre-trained language representation[EB/OL].[2021-01-05].https://bakser.github.io/files/TACL-KEPLER/KEPLER.pdf. [51] LIU W, ZHOU P, ZHAO Z, et al.K-BERT:enabling language representation with knowledge graph[C]//Proceedings of AAAI Conference on Artificial Intelligence.New York, USA:AAAI Press, 2020:2901-2908. [52] WANG R, TANG D, DUAN N, et al.K-adapter:infusing knowledge into pre-trained models with adapters[EB/OL].[2021-01-05].https://arxiv.org/pdf/2002.01808v3.pdf. [53] ZHOU H, YOUNG T, HUANG M, et al.Commonsense knowledge aware conversation generation with graph attention[EB/OL].[2021-01-05].http://coai.cs.tsinghua.edu.cn/hml/media/files/2018_commonsense_ZhouHao_3_TYVQ7Iq.pdf. [54] WU S, LI Y, ZHANG D, et al.Diverse and informative dialogue generation with context-specific commonsense knowledge awareness[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.San Diego, USA:Association for Computational Linguistics, 2020:5811-5820. [55] ZHOU H, YOUNG T, HUANG M, et al.Commonsense knowledge aware conversation generation with graph attention[EB/OL].[2021-01-05].http://coai.cs.tsinghua.edu.cn/hml/media/files/2018_commonsense_ZhouHao_3_TYVQ7Iq.pdf. [56] HAYKIN S, KOSKO B.Gradientbased learning applied to document recognition[EB/OL].[2021-01-05].https://axon.cs.byu.edu/~martinez/classes/678/Papers/Convolution_nets.pdf. [57] WEI J, REN X, LI X, et al.NEZHA:neural contextualized representation for Chinese language understanding[EB/OL].[2021-01-05].https://lonepatient.top/2020/01/20/NEZHA. [58] SHAOUL C, BAAYEN R H, WESTBURY C F.N-gram probability effects in a cloze task[J].The Mental Lexicon, 2014, 9(3):437-472. [59] DIAO S, BAI J, SONG Y, et al.ZEN:pre-training Chinese text encoder enhanced by N-gram representations[EB/OL].[2021-01-05].https://aclanthology.org/2020.findings-emnlp.425.pdf. [60] CUI Y, CHE W, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2021-01-05].https://arxiv.org/pdf/1906.08101v2.pdf. [61] LEE J, YOON W, KIM S, et al.BioBERT:a pre-trained biomedical language representation model for biomedical text mining[J].Bioinformatics, 2020, 36(4):1234-1240. [62] LEE J S, HSIANG J.Patent classification by fine-tuning BERT language model[EB/OL].[2021-01-05].https://arxiv.org/ftp/arxiv/papers/1906/1906.02124.pdf. [63] BELTAGY I, LO K, COHAN A.SciBERT:a pretrained language model for scientific text[EB/OL].[2021-01-05].https://aclanthology.org/D19-1371.pdf. [64] ZHAO S, GUPTA R, SONG Y, et al.Extreme language model compression with optimal subwords and shared projections[EB/OL].[2021-01-05].https://openreview.net/pdf?id=S1x6ueSKPr. |