基于CNN‐Head Transformer编码器的中文命名实体识别

doi:10.19678/j.issn.1000-3428.0062525

计算机工程 ›› 2022, Vol. 48 ›› Issue (10): 73-80. doi: 10.19678/j.issn.1000-3428.0062525

基于CNN‐Head Transformer编码器的中文命名实体识别

史占堂^1,2,3, 马玉鹏^1,2,3, 赵凡^1,2,3, 马博^1,2,3

1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011;
2. 中国科学院大学, 北京 100049;
3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011

收稿日期:2021-08-28 修回日期:2021-10-22 发布日期:2021-11-16
作者简介:史占堂(1995—),男,硕士研究生,主研方向为自然语言处理;马玉鹏、赵凡,研究员、博士;马博,副研究员、博士。
基金资助:
国家重点研发计划（2018YFC0825300，2018YFC0823002）；新疆维吾尔自治区重大专项（2020A03004-4）；“西部之光”人才培养计划（2018-XBQNXZ-A-003）；中国科学院青年创新促进会基金（科发人函字［2019］26号）。

Chinese Named Entity Recognition Based on CNN-Head Transformer Encoder

SHI Zhantang^1,2,3, MA Yupeng^1,2,3, ZHAO Fan^1,2,3, MA Bo^1,2,3

1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China

Received:2021-08-28 Revised:2021-10-22 Published:2021-11-16

摘要/Abstract

摘要： 基于多头自注意力机制的Transformer作为主流特征提取器在多种自然语言处理任务中取得了优异表现，但应用于命名实体识别任务时存在一字多词、增加额外存储与词典匹配时间等问题。提出一种CNN-Head Transformer编码器（CHTE）模型，在未使用外部词典和分词工具的基础上，通过自注意力机制捕获全局语义信息，利用不同窗口大小的CNN获取Transformer中6个注意力头的Value向量，使CHTE模型在保留全局语义信息的同时增强局部特征和潜在词信息表示，并且应用自适应的门控残差连接融合当前层和子层特征，提升了Transformer在命名实体识别领域的性能表现。在Weibo和Resume数据集上的实验结果表明，CHTE模型的F1值相比于融合词典信息的Lattice LSTM和FLAT模型分别提升了3.77、2.24和1.30、0.31个百分点，具有更高的中文命名实体识别准确性。

关键词: 命名实体识别, 自注意力机制, Transformer编码器, 卷积神经网络, 残差连接

Abstract: Transformers based on the multihead self-attention mechanism as a mainstream feature extractor have achieved excellent performance in various Natural Language Processing(NLP) tasks.However, problems such as one character belonging to multiple words, additional storage, and dictionary matching time arise when those transformers are applied to Named Entity Recognition(NER) tasks.This study proposes a Convolutional Neural Network(CNN)-Head Transformer Encoder(CHTE) model to capture global semantic information via a self-attention mechanism without using external lexicon and word segmentation tools.Additionally, a CNN with different window sizes is used to obtain the value vectors of six attention heads in the transformer.It preserves global semantic information while enhancing local features and latent word information representation, as well as applies adaptive gated residual connection to fuse current layer and sublayer features to improve the performance of the transformer in NER tasks.Experimental results show that the F1 values of the CHTE model on the Weibo dataset are 3.77 and 2.24 percentage points higher than those of the Lattice Long Short-Term Memory(Lattice LSTM) and FLAT-lattice Transformer(FLAT) models fused with dictionary information, respectively;the F1 values of the CHTE model on the Resume dataset improved by 1.30 and 0.31 percentage points than those of the Lattice LSTM and FLAT models, respectively;and the CHTE model demonstrates higher recognition accuracy for Chinese named entities than Lattice LSTM and FLAT models.

Key words: Named Entity Recognition(NER), self-attention mechanism, Transformer encoder, Convolutional Neural Network(CNN), residual connection

中图分类号:

TP391

史占堂, 马玉鹏, 赵凡, 马博. 基于CNN‐Head Transformer编码器的中文命名实体识别[J]. 计算机工程, 2022, 48(10): 73-80.

SHI Zhantang, MA Yupeng, ZHAO Fan, MA Bo. Chinese Named Entity Recognition Based on CNN-Head Transformer Encoder[J]. Computer Engineering, 2022, 48(10): 73-80.

https://www.ecice06.com/CN/Y2022/V48/I10/73

图/表 9

参考文献

[1] WANG L L, CAO Z, DE MELO G, et al.Relation classification via multi-level attention CNNs[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2016:1298-1307.
[2] LIU X, LUO Z C, HUANG H Y.Jointly multiple events extraction via attention-based graph information aggregation[EB/OL].[2021-07-11].https://arxiv.org/abs/1809.09078.
[3] KOLITSAS N, GANEA O E, HOFMANN T.End-to-end neural entity linking[EB/OL].[2021-07-11].https://arxiv.org/abs/1808.07699.
[4] GERS F A, SCHMIDHUBER J, CUMMINS F.Learning to forget:continual prediction with LSTM[J].Neural Computation, 2000, 12(10):2451-2471.
[5] ZHANG Y, YANG J.Chinese NER using lattice LSTM[EB/OL].[2021-07-11].https://arxiv.org/abs/1805.02023.
[6] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[EB/OL].[2021-07-11].https://arxiv.org/abs/1706.03762.
[7] YAN H, DENG B C, LI X N, et al.TENER:adapting Transformer encoder for named entity recognition[EB/OL].[2021-07-11].https://arxiv.org/abs/1911.04474.
[8] LECUN Y, BOSER B, DENKER J S, et al.Backpropagation applied to handwritten Zip code recognition[J].Neural Computation, 1989, 1(4):541-551.
[9] HUANG Z H, XU W, YU K.Bidirectional LSTM-CRF models for sequence tagging[EB/OL].[2021-07-11].https://arxiv.org/abs/1508.01991.
[10] LAFFERTY J, MCCALLUM A, PEREIRA F.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning.New York, USA:ACM Press, 2001:282-289.
[11] PENG N Y, DREDZE M.Named entity recognition for Chinese social media with jointly trained embeddings[C]//Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2015:548-554.
[12] MA X Z, HOVY E.End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[EB/OL].[2021-07-11].https://arxiv.org/abs/1603.01354.
[13] STRUBELL E, VERGA P, BELANGER D, et al.Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2017:1-11.
[14] LIU Y J, ZHANG Y, CHE W X, et al.Domain adaptation for CRF-based Chinese word segmentation using free annotations[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2014:864-874.
[15] CAO P F, CHEN Y B, LIU K, et al.Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2018:182-192.
[16] GUI T, MA R T, ZHANG Q, et al.CNN-based Chinese NER with lexicon rethinking[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.New York, USA:ACM Press, 2019:4982-4988.
[17] MENG Y X, WU W, WANG F, et al.Glyce:glyph-vectors for Chinese character representations[EB/OL].[2021-07-11].https://arxiv.org/abs/1901.10125.
[18] XUAN Z Y, BAO R, JIANG S Y.FGN:fusion glyph network for Chinese named entity recognition[M].Berlin, Germany:Springer, 2021.
[19] 张栋, 王铭涛, 陈文亮.结合五笔字形与上下文相关字向量的命名实体识别[J].计算机工程, 2021, 47(3):94-101. ZHANG D, WANG M T, CHEN W L.Named entity recognition combining Wubi glyphs with contextualized character embeddings[J].Computer Engineering, 2021, 47(3):94-101.(in Chinese)
[20] 司逸晨, 管有庆.基于Transformer编码器的中文命名实体识别模型[J].计算机工程, 2022, 48(7):66-72. SI Y C, GUAN Y Q.Chinese named entity recognition model based on Transformer encoder[J].Computer Engineering, 2022, 48(7):66-72.(in Chinese)
[21] XUE M G, YU B W, LIU T W, et al.Porous Lattice Transformer encoder for Chinese NER[C]//Proceedings of the 28th International Conference on Computational Linguistics.Stroudsburg, USA:International Committee on Computational Linguistics, 2020:3831-3841.
[22] LI X N, YAN H, QIU X P, et al.FLAT:Chinese NER using FLAT-lattice Transformer[EB/OL].[2021-07-11].https://arxiv.org/abs/2004.11795.
[23] HE H F, SUN X.F-score driven max margin neural network for named entity recognition in Chinese social media[EB/OL].[2021-07-11].https://arxiv.org/abs/1611.04234.
[24] ZHU Y Y, WANG G X, KARLSSON B F.CAN-NER:convolutional attention network for Chinese named entity recognition[EB/OL].[2021-07-11].https://arxiv.org/abs/1904.02141.
[25] CUI Y M, CHE W X, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2021-07-11].https://arxiv.org/abs/1906.08101.
[26] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional Transformers for language understanding[EB/OL].[2021-07-11].https://arxiv.org/abs/1810.04805.

选择文件类型/文献管理软件名称

选择包含的内容

基于CNN‐Head Transformer编码器的中文命名实体识别

Chinese Named Entity Recognition Based on CNN-Head Transformer Encoder

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[2]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[3]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[4]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[5]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[6]	于洋, 孙芳芳, 吕华, 李扬, 王晓民. 基于多尺度时空注意力网络的微表情检测方法[J]. 计算机工程, 2024, 50(6): 228-235.
[7]	贺姗, 蔺素珍, 王彦博, 李大威. 基于特征融合的多波段图像描述生成方法[J]. 计算机工程, 2024, 50(6): 236-244.
[8]	隗昊, 刁宏悦, 孔亮宸, 邓耀臣. 东北亚舆情文本细粒度命名实体识别方法研究[J]. 计算机工程, 2024, 50(5): 354-362.
[9]	隗昊, 刁宏悦, 孔亮宸, 邓耀臣. 东北亚舆情文本细粒度命名实体识别方法研究[J]. 计算机工程, 2024, 50(5): 354-362.
[10]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[11]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[12]	李政学, 李枝名, 彭德中, 陈杰. 基于特征对比学习和图卷积的社交网络用户分类[J]. 计算机工程, 2024, 50(4): 258-266.
[13]	姜百浩, 刘静, 仇大伟, 姜良. 深度学习在脊柱图像分割中的应用综述[J]. 计算机工程, 2024, 50(3): 1-15.
[14]	王柏涵, 姜晓燕, 范柳伊. 基于深度监督隐空间构建的语义分割改进方法[J]. 计算机工程, 2024, 50(3): 191-199.
[15]	谢新林, 尹东旭, 张涛源, 谢刚. 基于注意力机制的多尺度融合人群计数算法[J]. 计算机工程, 2024, 50(3): 290-297.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于CNN‐Head Transformer编码器的中文命名实体识别

Chinese Named Entity Recognition Based on CNN-Head Transformer Encoder

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献

相关文章 15

编辑推荐

Metrics

本文评价