作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 306-313. doi: 10.19678/j.issn.1000-3428.0060062

• 开发研究与工程应用 • 上一篇    下一篇

基于有向图模型的旅游领域命名实体识别

崔丽平1,2,3, 古丽拉·阿东别克1,2,3, 王智悦1   

  1. 1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;
    2. 新疆多语种信息技术重点实验室, 乌鲁木齐 830046;
    3. 国家语言资源监测与研究少数民族语言中心 哈萨克和柯尔克孜语文基地, 乌鲁木齐 830046
  • 收稿日期:2020-11-19 修回日期:2021-01-27 发布日期:2021-01-28
  • 作者简介:崔丽平(1994-),女,硕士研究生,主研方向为自然语言处理;古丽拉·阿东别克(通信作者),教授、博士;王智悦,硕士研究生。
  • 基金资助:
    国家自然科学基金(62062062);新疆大学基金(BS180250)。

Named Entity Recognition in Tourism Based on Directed Graph Model

CUI Liping1,2,3, Altenbek Gulila1,2,3, WANG Zhiyue1   

  1. 1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
    2. Xinjiang Key Laboratory of Multi-language Information Technology, Urumqi 830046, China;
    3. The Base of Kazakh and Kirghiz Language, National Language Resource Monitoring and Research Center of Minority Languages, Urumqi 830046, China
  • Received:2020-11-19 Revised:2021-01-27 Published:2021-01-28

摘要: 旅游领域命名实体识别是旅游知识图谱构建过程中的关键步骤,与通用领域的实体相比,旅游文本的实体具有长度长、一词多义、嵌套严重的特点,导致命名实体识别准确率低。提出一种融合词典信息的有向图神经网络(L-CGNN)模型,用于旅游领域中的命名实体识别。将预训练词向量通过卷积神经网络提取丰富的字特征,利用词典构造句子的有向图,以生成邻接矩阵并融合字词信息,通过将包含局部特征的词向量和邻接矩阵输入图神经网络(GNN)中,提取全局语义信息,并引入条件随机场(CRF)得到最优的标签序列。实验结果表明,相比Lattice LSTM、ID-CNN+CRF、CRF等模型,L-CGNN模型在旅游和简历数据集上具有较高的识别准确率,其F1值分别达到86.86%和95.02%。

关键词: 知识图谱, 命名实体识别, 卷积神经网络, 图神经网络, 条件随机场

Abstract: Named entity recognition in the field of tourism is an important part in the construction of tourism knowledge graph.Compared with entities in the general field, entities in the tourism field are characterized by the long name, polysemy and frequent nesting, resulting in low accuracy of named entity recognition.To solve this problem, a directed graph neural network model named L-CGNN using dictionary information is proposed for named entity recognition in tourism.A Convolutional Neural Network(CNN) with multiple convolutions is used to extract rich character feature vectors.Then the directed graph of the sentence is constructed by using the dictionary to match word information in the sentence, and an adjacency matrix that integrates word and character information is generated.The word vectors containing local features and adjacency matrix are input into the Graph Neural Network(GNN) to extract global semantic information.Then Conditional Random Field (CRF) is introduced to decode the information and obtain the optimal label sequence.The experimental results show that compared with Lattice LSTM、ID-CNN+CRF、CRF models, the F1 score of the proposed model reaches 86.86% on tourism datasets and 95.02% on resume datasets, displaying high recognition accuracy of the model.

Key words: knowledge graph, Named Entity Recognition(NER), Convolutional Neural Network(CNN), Graph Neural Network(GNN), Conditional Random Field(CRF)

中图分类号: