计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 104-108.doi: 10.19678/j.issn.1000-3428.0055952

• 人工智能与模式识别 • 上一篇    下一篇

结合主题词嵌入和注意力机制的主题模型

覃婷婷, 刘峥, 陈可佳   

  1. 南京邮电大学 计算机学院, 南京 210023
  • 收稿日期:2019-09-08 修回日期:2019-11-05 发布日期:2019-11-12
  • 作者简介:覃婷婷(1994-),女,硕士研究生,主研方向为自然语言处理;刘峥(通信作者),讲师、博士;陈可佳,副教授、博士。
  • 基金项目:
    南京邮电大学引进人才科研启动基金(NY215045);南京邮电大学国家自然科学基金孵化项目(NY219084)。

Topic Model Combining Topic Word Embedding and Attention Mechanism

QIN Tingting, LIU Zheng, CHEN Kejia   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2019-09-08 Revised:2019-11-05 Published:2019-11-12

摘要: 社交软件的普及使得从海量数字文本中挖掘有效信息成为一个热点问题,经典主题模型LDA和LSA均基于单词共现来捕获主题信息,忽略了单词间的位置信息。为此,设计主题与单词间的注意力机制并将主题信息和单词信息融入到LDA框架中,构建一种主题模型JEA-LDA。该模型通过单词与主题间的注意力机制将单词信息和主题信息融合为特征表示,用于LDA模型的主题提取。实验结果表明,相比LDA、DMM等模型,该模型的主题一致性和分类性能均较高,能够取得更好的主题提取效果。

关键词: 主题模型, 单词嵌入, 主题嵌入, 注意力机制, LDA模型

Abstract: With the popularity of social software,mining effective information from massive digital documents has been a hotspot.The classic topic models including LDA and LSA capture topic information based on word co-occurrence and ignore the context information of words.To address the problem,this paper designs an attention mechanism between words and topics,integrates the topic information and word information into the LDA framework,and on this basis constructs a JEA-LDA topic model.The model uses the attention mechanism between words and topics to merge the word information and topic information into feature representation for topic extraction of the LDA model.The experimental results show that compared with LDA,DMM and other models,the proposed model has better performance in topic coherence and classification tasks,and improves the topic extraction results.

Key words: topic model, word embedding, topic embedding, attention mechanism, LDA model

中图分类号: