计算机工程 ›› 2019, Vol. 45 ›› Issue (7): 303-308,314.doi: 10.19678/j.issn.1000-3428.0051312

• 开发研究与工程应用 • 上一篇    下一篇

基于注意力CNLSTM模型的新闻文本分类

刘月, 翟东海, 任庆宁   

  1. 西南交通大学 信息科学与技术学院, 成都 610097
  • 收稿日期:2018-04-23 修回日期:2018-06-07 出版日期:2019-07-15 发布日期:2019-07-15
  • 作者简介:刘月(1993-),女,硕士研究生,主研方向为数据挖掘、自然语言处理;翟东海,副教授、博士;任庆宁,硕士研究生。
  • 基金项目:
    国家自然科学基金(61540060)。

News Text Classification Based on CNLSTM Model with Attention Mechanism

LIU Yue, ZHAI Donghai, REN Qingning   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610097, China
  • Received:2018-04-23 Revised:2018-06-07 Online:2019-07-15 Published:2019-07-15

摘要: 结合卷积神经网络(CNN)和嵌套长短期记忆网络(NLSTM)2种模型,基于注意力机制提出一个用于文本表示和分类的CNLSTM模型。采用CNN提取短语序列的特征表示,利用NLSTM学习文本的特征表示,引入注意力机制突出关键短语以优化特征提取的过程。在3个公开新闻数据集中进行性能测试,结果表明,该模型的分类准确率分别为96.87%、95.43%和97.58%,其性能比baseline方法有显著提高。

关键词: 卷积神经网络, 特征表示, 嵌套长短期记忆网络, 注意力机制, 文本分类

Abstract: Combining Convolutional Neural Network (CNN) and Nested Long Short-Term Memory (NLSTM) models,this paper proposes a CNLSTM model for text representation and classification based on the attention mechanism.The model uses CNN to extract feature of phrase sequences,and then uses NLSTM to learn the representation of text features.By introducing attention mechanisms,the key phrases are highlighted to optimize feature extraction.Experiments on three published news data sets demonstrate that the classification accuracy of the model is 96.87%,95.43%,and 97.58%,respectively,and its performance is significantly improved compared with the baseline methods.

Key words: Convolutional Neural Network (CNN), feature representation, Nested Long Short-Term Memory(NLSTM), attention mechanism, text classification

中图分类号: