摘要:
为在缺乏资源和不依赖人工特征的情况下提高维吾尔文命名实体的识别性能,构建基于BiLSTM-CNN-CRF的神经网络模型。采用卷积神经网络训练具有维吾尔文单词的后缀、前缀等形态特征的字符向量,利用skip-gram模型对大规模语料进行训练,生成具有语义信息的低维度 稠密实数词向量。在此基础上,将字符向量、词性向量和词向量拼接的向量作为输入,构建适合维吾尔文命名实体识别的BiLSTM-CRF深层神经网络。实验结果表明,该模型能够解决命名实体的自动识别问题,具有较强的鲁棒性,F1值达到91.89%。
关键词:
递归神经网络,
卷积神经网络,
条件随机场,
维吾尔文,
命名实体识别
Abstract:
In order to obtain better Uyghur Named Entity Recognition(NER) performance without the need of resources and relying on artificial features is an important problem to be solved.In this paper,a neural network model based on BiLSTM-CNN-CRF is constructed.Firstly,Convolutional Neural Network(CNN) is used to train character vectors with morphological characteristics such as suffix and prefix of Uyghur words.Then,skip-gram model is used to train large-scale corpus to generate word vectors with semantic information.Finally,a BiLSTM-CRF deep neural network suitable for Uyghur NER is constructed by using concatenated vectors which includes the character vectors,part-of-speech vectors and word vectors as input.Experimental results show that the proposed model can solve the problem of automatic recognition of named entities and has good robustness.Its F1 value reaches 91.89 %.
Key words:
recurrent neural network,
Convolutional Neural Network(CNN),
Condition Random Field(CRF),
Uyghur,
Named Entity Recognition(NER)
中图分类号:
买买提阿依甫,吾守尔·斯拉木,帕丽旦·木合塔尔,杨文忠. 基于BiLSTM-CNN-CRF模型的维吾尔文命名实体识别[J]. 计算机工程, 2018, 44(8): 230-236.
Maimaitiayifu,SILAMU Wushouer,MUHETAER Palidan,YANG Wenzhong. Uyghur Named Entity Recognition Based on BiLSTM-CNN-CRF Model[J]. Computer Engineering, 2018, 44(8): 230-236.