Chinese Named Entity Recognition Based on Dilated Gated Convolution Feature Fusion

doi:10.19678/j.issn.1000-3428.0065455

Abstract

Abstract:

In the task of Chinese Named Entity Recognition(NER), the long short-term memory network model with cyclic structure can solve the problem of long-distance dependence by capturing temporal features, but its feature capture method is singular and the information acquisition ability is limited. By using multi-layer convolution to process text in parallel, the Convolutional Neural Network(CNN) can improve the operation speed of the model and capture the spatial features of text. However, simply stacking multiple convolutional layers can easily lead to the gradient vanishing problem.To obtain multi-dimensional text features simultaneously and improve the gradient vanishing problem, this paper proposes a Chinese NER model based on RoBERTa-wwm-DGCNN-BiLSTM-BMHA-CRF.Firstly, text is represented as a character-level embedding vector by the pre-trained language model RoBERTa-wwm based on the whole-word masking technique to capture the deep contextual semantic information.Secondly, the gating mechanism and residual structure are used to improve the Dilated CNN(DCNN) to reduce the risk of gradient disappearance, and then the Bi-directional Long Short-Term Memory(BiLSTM) network and Dilated Gated CNN(DGCNN) are used to capture the temporal and spatial characteristics of the text, respectively. Thirdly, the Bi-linear Multi-Head Attention (BMHA) mechanism is used to dynamically fuse the multi-dimensional text features. Finally, the Conditional Random Field(CRF) is used to constrain the results and obtain the best marker sequence. The experimental results indicate that the F1 values of the proposed model on the Resume, Weibo, and MSRA data sets were 97.20%, 74.28% and 95.74%, respectively, which proves the effectiveness of the proposed model for Chinese NER.

Key words: Named Entity Recognition(NER), RoBERTa-wwm model, dilated convolution, attention mechanism, feature fusion

摘要：

在中文命名实体识别任务中，具有循环结构的长短时记忆网络模型通过捕捉时序特征解决长距离依赖问题，但其特征捕捉方式单一，信息获取能力有限。卷积神经网络通过使用多层卷积并行处理文本，能够提高模型运算速度，捕捉文本的空间特征，但简单地堆叠多个卷积层容易导致梯度消失。为同时获得多维度的文本特征且改善梯度消失问题，提出一种基于RoBERTa-wwm-DGCNN-BiLSTM-BMHA-CRF的中文命名实体识别模型，通过基于全词遮蔽技术的预训练语言模型RoBERTa-wwm把文本表征为字符级嵌入向量，捕捉深度上下文语义信息，并采用门控机制和残差结构对空洞卷积神经网络进行改进以降低梯度消失的风险。使用双向长短时记忆网络和门控空洞卷积神经网络分别捕捉文本的时序特征和空间特征，采用双线性多头注意力机制对多维度的文本特征进行动态融合，最后使用条件随机场对结果进行约束，获得最佳标记序列。实验结果表明，所提模型在Resume、Weibo和MSRA数据集上的F1值分别为97.20%、74.28%和95.74%，证明了该模型在中文命名实体识别中的有效性。

关键词: 命名实体识别, RoBERTa-wwm模型, 空洞卷积, 注意力机制, 特征融合

Changpei YANG, Liefa LIAO. Chinese Named Entity Recognition Based on Dilated Gated Convolution Feature Fusion[J]. Computer Engineering, 2023, 49(8): 85-95.

杨长沛, 廖列法. 基于门控空洞卷积特征融合的中文命名实体识别[J]. 计算机工程, 2023, 49(8): 85-95.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0065455

http://www.ecice06.com/EN/Y2023/V49/I8/85

Figures/Tables 12

References 32

1	DDY S R. Hidden Markov models. Current Opinion in Structural Biology, 1996, 6 (3): 361- 365. doi: 10.1016/S0959-440X(96)80056-X
2	TONG S, KOLLER D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2001, 2, 45- 66.
3	LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. New York, USA: ACM Press, 2001: 282-289.
4	LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]//Proceedings of 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Strusbourg, USA: Association for Computational Linguistics, 2016: 387-396.
5	CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 2016, 4, 357- 370. doi: 10.1162/tacl_a_00104
6	FISHER Y, VLADLEN K. Multi-scale context aggregation by dilated convolutions [EB/OL]. [2022-07-01]. https://arxiv.org/abs/1511.07122.
7	STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2017: 465-478.
8	DAUPHIN Y N, FAN A, AULI M, et al. Language modeling with gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Learning. New York, USA: ACM Press, 2017: 933-941.
9	WANG C Q, CHEN W, XU B. Named entity recognition with gated convolutional neural networks[EB/OL]. [2022-07-01]. https://link.springer.com/chapter/10.1007/978-3-319-69005-6_10.
10	王笑月, 李茹, 段菲. 一种基于门控空洞卷积的高效中文命名实体识别方法. 中文信息学报, 2021, 35 (1): 72- 80. URL
	WANG X Y, LI R, DUAN F. An efficient Chinese named entity recognition method based on gated-dilated convolution. Journal of Chinese Information Processing, 2021, 35 (1): 72- 80. URL
11	谭岩杰, 陈玮, 尹钟. 门控空洞卷积与级联网络中文命名实体识别[J/OL]. 小型微型计算机系统: 1-10[2022-07-01]. http://kns.cnki.net/kcms/detail/21.1106.tp.20220418.1445.032.html.
	TAN Y J, CHEN W, YIN Z. Chinese named entity recognition for gated-dilated convolution and cascading networks[J/OL]. Journal of Chinese Mini-Micro Computer Systems: 1-10[2022-07-01]. http://kns.cnki.net/kcms/detail/21.1106.tp.20220418.1445.032.html. (in Chinese)
12	胥小波, 王涛, 康睿, 等. 多特征中文命名实体识别. 四川大学学报(自然科学版), 2022, 59 (2): 51- 57. URL
	XU X B, WANG T, KANG R, et al. Multi-feature Chinese named entity recognition. Journal of Sichuan University (Natural Science Edition), 2022, 59 (2): 51- 57. URL
13	廖涛, 黄荣梅, 张顺香, 等. 基于交互式特征融合的嵌套命名实体识别. 计算机工程, 2022, 48 (12): 119-126, 133 URL
	LIAO T, HUANG R M, ZHANG S X, et al. Nested named entity recognition based on interactive feature fusion. Computer Engineering, 2022, 48 (12): 119-126, 133 URL
14	廖列法, 谢树松. 基于注意力机制特征融合的中文命名实体识别. 计算机工程, 2023, 49 (4): 256- 262. URL
	LIAO L F, XIE S S. Chinese named entity recognition based on attention mechanism feature fusion. Computer Engineering, 2023, 49 (4): 256- 262. URL
15	岳增营, 叶霞, 刘睿珩. 基于语言模型的预训练技术研究综述. 中文信息学报, 2021, 35 (9): 15- 29. URL
	YUE Z Y, YE X, LIU R H. A survey of language model based pre-training technology. Journal of Chinese Information Processing, 2021, 35 (9): 15- 29. URL
16	梁杰, 陈嘉豪, 张雪芹, 等. 基于独热编码和卷积神经网络的异常检测. 清华大学学报(自然科学版), 2019, 59 (7): 523- 529. URL
	LIANG J, CHEN J H, ZHANG X Q, et al. Anomaly detection based on single heat coding and convolutional neural network. Journal of Tsinghua University (Science and Technology), 2019, 59 (7): 523- 529. URL
17	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. [2022-07-01]. https://arxiv.org/abs/1301.3781.
18	PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2014: 1532-1543.
19	PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[EB/OL]. [2022-07-01]. https://arxiv.org/abs/1802.05365.
20	RADFORD A, NARASIMHAN K. Improving language understanding by generative pre-training[EB/OL]. [2022-07-01]. https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford-Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035.
21	AHMAD W, CHAKRABORTY S, RAY B, et al. Unified pre-training for program understanding and generation[C]//Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: Association for Computational Linguistics, 2021: 2655-2668.
22	LAN Z, CHEN M, GOODMAN S, et al. ALBERT: a lite BERT for self-supervised learning of language representations[EB/OL]. [2022-07-01]. https://arxiv.org/abs/1909.11942.
23	LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2022-07-01]. https://arxiv.org/abs/1907.11692.
24	ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2018: 1554-1564.
25	GUI T, ZOU Y C, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]//Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2019: 1040-1050.
26	GUI T, MA R, ZHANG Q, et al. CNN-based Chinese NER with lexicon rethinking[EB/OL]. [2022-07-01]. https://www.researchgate.net/publication/334844205_CNN-Based_Chinese_NER_with_Lexicon_Rethinking.
27	YAN H, DENG B, LI X, et al. TENER: adapting transformer encoder for named entity recognition[EB/OL]. [2022-07-01]. https://arxiv.org/abs/1911.04474.
28	MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 5951-5960.
29	LIU W, FU X Y, ZHANG Y, et al. Lexicon enhanced Chinese sequence labeling using BERT adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2021: 5847-5858.
30	WU S, SONG X, FENG Z, et al. NFLAT: non-flat-lattice transformer for Chinese named entity recognition[EB/OL]. [2022-07-01]. https://arxiv.org/abs/2205.05832.
31	LI X N, YAN H, QIU X P, et al. FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 6836-6842.
32	ZHU E W, LI J P. Boundary smoothing for named entity recognition[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2022: 7096-7108.

数据集	训练集	验证集	测试集
Resume	127 919	14 352	15 576
Weibo	75 127	14 778	15 111
MSRA	2 050 525	177 231	170 008

数据集	训练集	验证集	测试集
Resume	127 919	14 352	15 576
Weibo	75 127	14 778	15 111
MSRA	2 050 525	177 231	170 008

模型	参数	取值
RoBERTa-wwm	Transformer	12
	Bert_embedding	768
	Max_length	128
BiLSTM	LSTM_hidden	128
BiLSTM	LSTM_layers	1
DGCNN	Kernel_size	3
	filter_number	120
	Dilated_rate	1, 1, 2
BMHA	n_head	12
	score_function	bi_linear
	hidden_dim	128
其他参数	dropout	0.5
	学习率	3×10^-5
	epoch	64
	优化器	Adam

模型	参数	取值
RoBERTa-wwm	Transformer	12
	Bert_embedding	768
	Max_length	128
BiLSTM	LSTM_hidden	128
BiLSTM	LSTM_layers	1
DGCNN	Kernel_size	3
	filter_number	120
	Dilated_rate	1, 1, 2
BMHA	n_head	12
	score_function	bi_linear
	hidden_dim	128
其他参数	dropout	0.5
	学习率	3×10^-5
	epoch	64
	优化器	Adam

对比模型	Resume数据集			Weibo数据集			MSRA数据集
对比模型	P	R	F₁	P	R	F₁	P	R	F₁
Lattice-LSTM	94.81	94.10	94.46	—	—	58.79	93.57	92.79	93.18
LGN	95.28	95.46	95.37	—	—	60.21	94.19	92.73	93.46
LR-CNN	95.37	94.84	95.11	—	—	59.92	94.50	92.93	93.71
TENER	—	—	95.00	—	—	58.17	—	—	92.74
SoftLexicon	96.08	96.13	96.11	—	—	70.50	95.75	95.10	95.42
LEBERT	—	—	96.08	—	—	70.75	—	—	95.70
NFLAT	95.63	95.52	95.58	—	—	61.94	94.92	94.19	94.55
BSNER	96.63	96.69	96.66	70.16	75.36	72.66	96.37	96.15	96.26
Our model	96.91	97.49	97.20	77.91	70.97	74.28	95.93	95.56	95.74

Please choose a citation manager

Content to export