摘要: 针对电子公告栏(BBS)内容演化过程中话题数量动态变化的特点,提出基于潜在狄利克雷分布的自适应在线话题演化模型。该模型以历史时间窗口中话题、词分布的后验线性加权调节当前时间窗口中话题、词分布的先验,给出在线新话题检测和消亡话题检测方法,自动适应数据流中的话题数量。实验结果表明,该模型能有效识别BBS内容演化过程中话题的产生与消亡,分析它们在时间和内容上的演化,及时发现热点事件。
关键词:
网络舆情,
话题模型,
话题演化,
非监督学习,
多项式分布,
时间窗口
Abstract: Aiming at the problem of topic number dynamic change in the process of on-line topic evolution for Bulletin Board System(BBS), a new adaptive on-line topic evolution model based on Latent Dirichlet Allocation(LDA) is proposed. This model uses the posterior of topic and word distribution in historical time window to adjust the prior of current by linear weighted, which is able to find new topic and vanished topic in text stream and automatically update topic number and represent their evolution in time and content. Experimental result shows that the proposed model can identify the new topic well and analyze their evolution in time and content, and the hot spots can be discovered in time.
Key words:
Internet public opinion,
topic model,
topic evolution,
unsupervised learning,
multinomial distribution,
time window
中图分类号:
杨春明,张晖,石大文. BBS网络舆情的在线自适应话题演化模型[J]. 计算机工程.
YANG Chun-ming, ZHANG Hui, SHI Da-wen. Adaptive On-line Topic Evolution Model of Internet Public Opinion for BBS[J]. Computer Engineering.