作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于转发层次分析的新浪微博热度预测研究

翟晓芳1,刘全明1,程耀东2,胡庆宝2,李海波2   

  1. (1.山西大学计算机与信息技术学院,太原 030006; 2.中国科学院高能物理研究所计算中心,北京 100049)
  • 收稿日期:2014-07-08 出版日期:2015-07-15 发布日期:2015-07-15
  • 作者简介:翟晓芳(1990-),女,硕士研究生,主研方向:网络数据处理;刘全明,副教授、博士;程耀东,副研究员、博士;胡庆宝,助理研究员、硕士;李海波,助理研究员、博士。
  • 基金资助:

    国家“863”计划基金资助项目“基于媒体大数据的大众信息消费服务平台及应用示范”(SS2014AA012305)。

Research on Hotness Prediction in Sina Microblog Based on Forward Level Analysis

ZHAI Xiaofang 1,LIU Quanming 1,CHENG Yaodong 2,HU Qingbao 2,LI Haibo 2   

  1. (1.College of Computer & Information Technology,Shanxi University,Taiyuan 030006,China; 2.Computing Center,Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China)
  • Received:2014-07-08 Online:2015-07-15 Published:2015-07-15

摘要:

微博作为新型的消息传播媒介,其影响力和传播速度都超越了传统主流媒体,预测微博热度对舆情监测、政府宣传、企业营销及热点推送等具有重要意义。通过分析微博转发的层次规律,结合转发量、转发深度及广度指标,定义新的热度指数计算方法。将微博热度划分为5个等级,对转发数大于100的微博预测其热度达到特定等级的概率。使用有监督的机器学习算法,先后提取训练样本的静态和动态特征训练热度预测模型。通过自主开发的BigData爬虫开放平台获取来源于新浪微博的训练样本,并应用十折交叉验证法进行实验,结果表明,相比只使用静态特征的热度预测模型,加入微博动态特征能有效提高预测性能,平均F1值达到76.9%。

关键词: 微博, 爬虫, 静态特征, 动态特征, 热度指数, 多分类问题

Abstract:

Microblog is a new type of news media,and its influence and propagation speed surpasses traditional major media.Therefore,it has a great importance to predict hotness in microblog for public opinion monitoring,government propaganda,corporation marketing and popular issues pushing.Through analyzing microblog forward level which combining the effects of the forward index,forward depth and breadth index,this paper gives a new definition of calculating the hotness index of microblog.Then depend on this definition,the hotness index of the microblog is classified as five levels.The goal is to predict the hotness of microblog whose repost count is over 100 to achieve a specified level.By using supervised machine learning algorithm,it successively extracts the static attributes and dynamic repost characteristics of the training samples to train hotness prediction model.The training samples is from Sina microblog is caught by using self-developed BigData open crawler platform.Experimental result by using 10-fold cross-validation shows that,compared with hotness prediction model based on static attributes,the model with dynamic features can effectively improve the prediction performance,and F1-measure achieves 76.9%.

Key words: microblog, crawler, static feature, dynamic feature, hotness index, multi-classification problem

中图分类号: