Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2019, Vol. 45 ›› Issue (1): 210-216. doi: 10.19678/j.issn.1000-3428.0049745

Previous Articles     Next Articles

Biterm Topic Model Based on Semantic Extension of Double Words

LI Siyu,XIE Jun,ZOU Xuejun,XU Xinying,JI Xiaoping   

  1. College of Information Engineering,Taiyuan University of Technology,Jinzhong,Shanxi 030600,China
  • Received:2017-12-19 Online:2019-01-15 Published:2019-01-15

基于双词语义扩展的Biterm主题模型

李思宇,谢珺,邹雪君,续欣莹,冀小平   

  1. 太原理工大学 信息工程学院,山西 晋中 030600
  • 作者简介:李思宇(1992—),男,硕士,主研方向为自然语言处理、文本主题模型;谢珺,副教授、博士;邹雪君,硕士;续欣莹、冀小平,副教授、博士。
  • 基金资助:

    山西省回国留学人员科研项目(2015-045)。

Abstract:

Aiming at the lack of semantic connection between double words in Biterm Topic Model(BTM) short text documents,a BTM based on semantic extension of double words is proposed.Considering the semantic relationship between each word in double words,the word vector model is introduced.By training the word vector model,the semantic distance between word in double words is judged,and the BTM is extended according to the semantic distance.Experimental results show that,compared with the existing BTM,this model not only has better short text topic classification effect,but also improves the performance of semantic association and topic meaning clustering between double words.

Key words: Biterm Topic Model(BTM), double words, word vector, double words semantic, Gibbs sampling

摘要:

针对Biterm主题模型短文本文档的双词产生过程中词对之间缺乏语义联系的情况,提出一种融入词对语义扩展的Biterm主题模型。考虑双词的语义关系,引入词向量模型。通过训练词向量模型,判断词与词之间的语义距离,并根据语义距离对Biterm主题模型进行双词语义扩展。实验结果表明,与现有Biterm主题模型相比,该模型不仅具有较好的短文本主题分类效果,而且双词间的语义关联性能及主题词义聚类性能也得到明显提升。

关键词: Biterm主题模型, 双词, 词向量, 双词语义, 吉布斯采样

CLC Number: