作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (2): 39-45. doi: 10.19678/j.issn.1000-3428.0056522

• 人工智能与模式识别 • 上一篇    下一篇

结合单词-字符引导注意力网络的中文旅游文本命名实体识别

西尔艾力·色提1,2, 艾山·吾买尔1,2, 王路路1,2, 吐尔根·依布拉音1,2, 马喆康2,3, 买合木提·买买提1,2   

  1. 1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;
    2. 新疆大学 新疆多语种信息技术重点实验室, 乌鲁木齐 830046;
    3. 新疆大学 软件学院, 乌鲁木齐 830046
  • 收稿日期:2019-11-07 修回日期:2020-01-16 出版日期:2021-02-15 发布日期:2020-02-21
  • 作者简介:西尔艾力·色提(1994-),男,硕士研究生,主研方向为自然语言处理、命名实体识别;艾山·吾买尔,副教授;王路路,博士研究生;吐尔根·依布拉音,教授、博士生导师;马喆康,硕士研究生;买合木提·买买提(通信作者),实验师、博士。
  • 基金资助:
    国家自然科学基金(61262060,61662077);国家重点研发计划(2017YFB1002103);新疆维吾尔自治区重点实验室开放课题(2018D04019)。

Named Entity Recognition for Chinese Tourism Texts Combining Word-Character Guided Attention Network

Xieraili Seti1,2, Aishan Wumaier1,2, WANG Lulu1,2, Tuergen Yibulayin1,2, MA Zhekang2,3, Maihemuti Maimaiti1,2   

  1. 1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
    2. Xinjiang Key Laboratory of Multi-languange Information Technology, Xinjiang University, Urumqi 830046, China;
    3. College of Software, Xinjiang University, Urumqi 830046, China
  • Received:2019-11-07 Revised:2020-01-16 Online:2021-02-15 Published:2020-02-21

摘要: 传统基于词向量表示的命名实体识别方法通常忽略了字符语义信息、字符间的位置信息,以及字符和单词间的关联关系。提出一种基于单词-字符引导注意力网络(WCGAN)的中文旅游命名实体识别方法,利用单词引导注意力网络获取单词间的序列信息和关键单词信息,采用字符引导注意力网络捕获字符语义信息和字符间的位置信息,增强单词和字符间的关联性与互补性,从而实现中文旅游文本中命名实体的识别。实验结果表明,WCGAN方法在ResumeNER和TourismNER基准数据集上的F值分别为93.491%和92.860%,相比Bi-LSTM+CRF、Char-Dense等方法识别效果更好。

关键词: 命名实体识别, 字符引导注意力网络, 单词引导注意力网络, 字符语义, 信息互补, 位置信息

Abstract: The traditional Named Entity Recognition(NER) methods based on word vector usually neglect the character semantics of Chinese characters,the position information between characters,and the dependence between characters and words.To address the problem,this paper proposes a NER method based on Word-Character Guided Attention Network(WCGAN) for Chinese tourism texts.The method uses the Word-Guided Attention Network(WGAN) to obtain the sequence information between words and further capture the significant word information.The Character Guided Attention Network(CGAN) is used to obtain the information about character semantics and position between characters,and thus enhance the relevance and complementarity between words and characters to realize the recognition of named entities in Chinese tourism texts.Experimental results on the two benchmark datasets of ResumeNER and TourismNER show that the F values of the WCGAN method are 93.491% and 92.860% respectively,and the proposed method has better recognition performance than Bi-LSTM+CRF,Char-Dense and other methods.

Key words: Named Entity Recognition(NER), Character Guided Attention Network(CGAN), Word Guided Attention Network(WGAN), character semantics, information complementary, location information

中图分类号: