计算机工程 ›› 2012, Vol. 38 ›› Issue (9): 288-290.doi: 10.3969/j.issn.1000-3428.2012.09.088

• 开发研究与设计技术 • 上一篇    下一篇

微博突发话题检测方法研究

邱云飞a,程 亮b   

  1. (辽宁工程技术大学 a. 软件学院;b. 电子与信息工程学院,辽宁 葫芦岛 125105)
  • 收稿日期:2011-12-07 出版日期:2012-05-05 发布日期:2012-05-05
  • 作者简介:邱云飞(1976-),男,副教授,主研方向:数据挖掘; 程 亮,硕士研究生

Research on Sudden Topic Detection Method for Microblog

QIU Yun-fei   a, CHENG Liang   b   

  1. (a. School of Software; b. School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China)
  • Received:2011-12-07 Online:2012-05-05 Published:2012-05-05

摘要: 话题检测与跟踪模型不能很好地处理随意性强、用语不规范的微博短信息。为此,提出一种基于动态滑动窗口的微博突发话题检测方法。利用窗口提取具有潜在突发性的信息,采用结合语义的归一化词频-反文档频率函数计算特征权重,构建结合语义的空间向量模型,使用Single-Pass聚类算法思想对其加以改进,生成最终聚类。实验结果表明,该算法能获得较准确的突发话题检测结果。

关键词: 微博, 突发话题, 滑动窗口, 语义相似度, 空间向量模型, 话题检测与跟踪

Abstract: Against the problem that Topic Detection and Tracking(TDT) can not deal with the short message texts on microblog which have strong randomness and non-standard terms effectively, a detecting method of sudden topics on microblog based on the dynamic sliding window is proposed. It includes the use of windows to extract the information with potential sudden, the use of normalized Term Frequeney-Inverse Doeument Frequeney(TF-IDF) function which is combined with semantic to compute feature weight and build Vector Space Model(VSM), the use of the main idea of Single-Pass clustering algorithm and then improving it in order to generate the final clustering. Experimental results show that the algorithm has an accurate result in sudden topic detection.

Key words: microblog, sudden topic, sliding window, semantic similarity, Vector Space Model(VSM), Topic Detection and Tracking(TDT)

中图分类号: