计算机工程 ›› 2021, Vol. 47 ›› Issue (1): 94-100.doi: 10.19678/j.issn.1000-3428.0056734

• 人工智能与模式识别 • 上一篇    下一篇

融合知识图谱与注意力机制的短文本分类模型

丁辰晖1, 夏鸿斌1,2, 刘渊1,2   

  1. 1. 江南大学 数字媒体学院, 江苏 无锡 214122;
    2. 江苏省媒体设计与软件技术重点实验室, 江苏 无锡 214122
  • 收稿日期:2019-11-28 修回日期:2020-01-15 发布日期:2020-02-10
  • 作者简介:丁辰晖(1994-),男,硕士研究生,主研方向为自然语言处理;夏鸿斌,副教授、博士;刘渊,教授、博士生导师。
  • 基金项目:
    国家自然科学基金(61672264);国家科技支撑计划项目(2015BAH54F01)。

Short Text Classification Model Combining Knowledge Graph and Attention Mechanism

DING Chenhui1, XIA Hongbin1,2, LIU Yuan1,2   

  1. 1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China;
    2. Jiangsu Key Laboratory of Media Design and Software Technology, Wuxi, Jiangsu 214122, China
  • Received:2019-11-28 Revised:2020-01-15 Published:2020-02-10

摘要: 针对短文本缺乏上下文信息导致的语义模糊问题,构建一种融合知识图谱和注意力机制的神经网络模型。借助现有知识库获取短文本相关的概念集,以获得短文本相关先验知识,弥补短文本缺乏上下文信息的不足。将字符向量、词向量以及短文本的概念集作为模型的输入,运用编码器-解码器模型对短文本与概念集进行编码,利用注意力机制计算每个概念权重值,减小无关噪声概念对短文本分类的影响,在此基础上通过双向门控循环单元编码短文本输入序列,获取短文本分类特征,从而更准确地进行短文本分类。实验结果表明,该模型在AGNews、Ohsumed和TagMyNews短文本数据集上的准确率分别达到73.95%、40.69%和63.10%,具有较好的分类能力。

关键词: 短文本分类, 知识图谱, 自然语言处理, 注意力机制, 双向门控循环单元

Abstract: Concerning the semantic ambiguity caused by the lack of context information,this paper proposes a neural network model,which combines knowledge graph and attention mechanism.By using the existing knowledge base to obtain the concept set related to the short text,the prior knowledge related to the short text is obtained to address the lack of context information in the short text.The character vector,word vector,and concept set of the short text are taken as the input of the model.Then the encoder-decoder model is used to encode the short text and concept set,and the attention mechanism is used to calculate the weight value of each concept to reduce the influence of unrelated noise concepts on short text classification.On this basis,a Bi-directional-Gated Recurrent Unit(Bi-GRU) is used to encode the input sequences of the short text to obtain short text classification features,so as to perform short text classification more effectively.Experimental results show that the accuracy of the model on AGNews,Ohsumed and TagMyNews short text data sets is 73.95%,40.69% and 63.10%,respectively,showing a good classification ability.

Key words: short text classification, knowledge graph, Natural Language Processing(NLP), attention mechanism, Bi-directional-Gated Recurrent Unit(Bi-GRU)

中图分类号: