计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 70-76.doi: 10.19678/j.issn.1000-3428.0055990

• 人工智能与模式识别 • 上一篇    下一篇

一种集成深度学习模型的旅游问句文本分类算法

马喆康a,c, 迪力亚尔·帕尔哈提a,c, 早克热·卡德尔b,c, 吐尔根·依布拉音b,c, 西尔艾力·色提b,c, 艾山·吾买尔b,c   

  1. 新疆大学 a. 软件学院;b. 信息科学与工程学院;c. 新疆多语种信息技术重点实验室, 乌鲁木齐 830046
  • 收稿日期:2019-09-11 修回日期:2019-11-06 发布日期:2019-11-12
  • 作者简介:马喆康(1995-),男,硕士研究生,主研方向为自然语言处理、问句文本分类;迪力亚尔·帕尔哈提,硕士研究生;早克热·卡德尔(通信作者),实验师、硕士;吐尔根·依布拉音,教授、博士;西尔艾力·色提,硕士研究生;艾山·吾买尔,副教授、博士。
  • 基金项目:
    国家自然科学基金(61762084);国家重点研发计划(2017YFB1002103);新疆维吾尔自治区重点实验室开放课题(2018D04019)。

A Classification Algorithm for Tourist Question Texts Integrated with Deep Learning Models

MA Zhekanga,c, Diliyaer Paerhatia,c, Zaokere Kadeerb,c, Tuergen Yibulayinb,c, Xerali Settib,c, Aishan Wumaierb,c   

  1. a. College of Software;b. College of Information Science and Engineering;c. Xinjiang Key Laboratory of Multi-language Information Technology, Xinjiang University, Urumqi 830046, China
  • Received:2019-09-11 Revised:2019-11-06 Published:2019-11-12

摘要: 为提高旅游问句文本中关键特征的利用率,提出一种集成词级卷积神经网络(WL-CNN)与句级双向长短期记忆(SL-Bi-LSTM)网络的旅游问句文本分类算法。利用WL-CNN和SL-Bi-LSTM分别学习词序列子空间向量和句序列深层语义信息,通过多头注意力机制将两种深度学习模型进行集成以实现旅游问句文本的语法和语义信息互补,并通过SoftMax分类器得到最终的旅游问句文本分类结果。实验结果表明,与基于传统深度学习模型的旅游问句文本分类算法相比,该算法在准确率和损失率上分别取得了0.986 6和0.127 7的最优结果,具有更好的分类效果。

关键词: 子空间结构信息, 深层语义信息, 多头注意力机制, 卷积神经网络, 双向长短期记忆网络

Abstract: To improve the utilization of key features of tourist question texts, this paper proposes a classification algorithm for tourist question texts integrated with the Word Level Convolutional Neural Network(WL-CNN) and the Sentence Level Bi-directional Long Short-Term Memory(SL-Bi-LSTM) network.The algorithm uses WL-CNN and SL-Bi-LSTM to learn the subspace vector of the word sequence and the deep semantic information of the sentence sequence.Then the two deep learning models are integrated by using the Multi-Head Attention Mechanism(MH-AM) to realize the syntactic and semantic information complementary of tourist question texts.Finally,the SoftMax classifier is used to obtain the classification results of tourist question texts.Experimental results show that the proposed algorithm has better classification performance than the tourist question text classification algorithms based on traditional deep learning models,increasing the accuracy to 0.986 6 and loss rate to 0.127 7.

Key words: subspace structure information, deep semantic information, Multi-Head Attention Mechanism(MH-AM), Convolutional Neural Network(CNN), Bi-directional Long Short-Term Memory(Bi-LSTM) network

中图分类号: