作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (8): 230-236. doi: 10.19678/j.issn.1000-3428.0050502

• 人工智能及识别技术 • 上一篇    下一篇

基于BiLSTM-CNN-CRF模型的维吾尔文命名实体识别

买买提阿依甫,吾守尔·斯拉木,帕丽旦·木合塔尔,杨文忠   

  1. 新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 收稿日期:2018-02-15 出版日期:2018-08-15 发布日期:2018-08-15
  • 作者简介:买买提阿依甫(1981—),男,博士,主研方向为机器学习、网络舆情分析;吾守尔·斯拉木,教授、中国工程院院士、博士生导师;帕丽旦·木合塔尔(通信作者),博士;杨文忠,副教授。
  • 基金资助:

    国家重点基础研究计划项目(2014CB340506);国家自然科学基金(61363063);新疆大学多语种重点实验室开放课题(XJDX 0905-2013-01)。

Uyghur Named Entity Recognition Based on BiLSTM-CNN-CRF Model

Maimaitiayifu,SILAMU Wushouer,MUHETAER Palidan,YANG Wenzhong   

  1. College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China
  • Received:2018-02-15 Online:2018-08-15 Published:2018-08-15

摘要:

为在缺乏资源和不依赖人工特征的情况下提高维吾尔文命名实体的识别性能,构建基于BiLSTM-CNN-CRF的神经网络模型。采用卷积神经网络训练具有维吾尔文单词的后缀、前缀等形态特征的字符向量,利用skip-gram模型对大规模语料进行训练,生成具有语义信息的低维度 稠密实数词向量。在此基础上,将字符向量、词性向量和词向量拼接的向量作为输入,构建适合维吾尔文命名实体识别的BiLSTM-CRF深层神经网络。实验结果表明,该模型能够解决命名实体的自动识别问题,具有较强的鲁棒性,F1值达到91.89%。

关键词: 递归神经网络, 卷积神经网络, 条件随机场, 维吾尔文, 命名实体识别

Abstract:

In order to obtain better Uyghur Named Entity Recognition(NER) performance without the need of resources and relying on artificial features is an important problem to be solved.In this paper,a neural network model based on BiLSTM-CNN-CRF is constructed.Firstly,Convolutional Neural Network(CNN) is used to train character vectors with morphological characteristics such as suffix and prefix of Uyghur words.Then,skip-gram model is used to train large-scale corpus to generate word vectors with semantic information.Finally,a BiLSTM-CRF deep neural network suitable for Uyghur NER is constructed by using concatenated vectors which includes the character vectors,part-of-speech vectors and word vectors as input.Experimental results show that the proposed model can solve the problem of automatic recognition of named entities and has good robustness.Its F1 value reaches 91.89 %.

Key words: recurrent neural network, Convolutional Neural Network(CNN), Condition Random Field(CRF), Uyghur, Named Entity Recognition(NER)

中图分类号: