计算机工程 ›› 2019, Vol. 45 ›› Issue (10): 208-214.doi: 10.19678/j.issn.1000-3428.0054297

• 人工智能及识别技术 • 上一篇    下一篇

一种基于神经网络与LDA的文本分类算法

牛硕硕, 柴小丽, 李德启, 谢彬   

  1. 中国电子科技集团公司第三十二研究所, 上海 201808
  • 收稿日期:2019-03-20 修回日期:2019-04-23 出版日期:2019-10-15 发布日期:2019-10-09
  • 作者简介:牛硕硕(1993-),男,硕士研究生,主研方向为机器学习、自然语言处理;柴小丽,研究员;李德启,工程师;谢彬,研究员、博士。
  • 基金项目:
    国家部委基金。

A Text Classification Algorithm Based on Neural Network and LDA

NIU Shuoshuo, CHAI Xiaoli, LI Deqi, XIE Bin   

  1. The 32 nd Research Institute of China Electronics Technology Group Corporation, Shanghai 201808, China
  • Received:2019-03-20 Revised:2019-04-23 Online:2019-10-15 Published:2019-10-09

摘要: 传统隐含狄利克雷分配(LDA)主题模型在文本分类计算时利用Gibbs Sampling拟合已知条件分布下的未知参数,较难权衡分类准确率与计算复杂度间的关系。为此,在LDA主题模型的基础上,利用神经网络拟合单词-主题概率分布,提出一种文本分类算法NLDA。在THUCNews语料库和复旦大学语料库上进行实验,结果表明,与传统LDA模型相比,该算法的平均分类准确率分别提升5.53%和4.67%,平均训练时间分别减少8%和10%。

关键词: 文本分类, 深度学习, 神经网络, 隐含狄利克雷分配, 主题模型

Abstract: The traditional Latent Dirichlet Allocation(LDA) topic model uses Gibbs Sampling to fit unknown parameters under known conditional distributions in text classification calculations,making it difficult to weigh classification accuracy and computation complexity.Therefore,based on the LDA topic model,a neural network is used to fit the word-topic probability distribution,and a text classification algorithm NLDA is proposed.Experiments on the THUCNews corpus and Fudan University corpus show that compared with the traditional LDA model,the average classification accuracy of the algorithm is increased by 5.53% and 4.67% respectively,and the average training time is reduced by 8% and 10%.

Key words: text classification, deep learning, neural network, Latent Dirichlet Allocation(LDA), topic model

中图分类号: