Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Study of TV Drama on Demand Ranking Prediction Fused with Social and Search Data

XU Xiaofeng,HE Liang,YANG Jing   

  1. (Department of Computer Science and Technology,East China Normal University,Shanghai 200241,China)
  • Received:2014-07-23 Online:2015-08-15 Published:2015-08-15

融合社交与搜索数据的电视剧点播排名预测研究

徐晓枫,贺樑,杨静   

  1. (华东师范大学计算机科学技术系,上海 200241)
  • 作者简介:徐晓枫(1990-),男,硕士研究生,主研方向:数据挖掘,信息抽取;贺樑,教授、博士、博士生导师;杨静,副教授、博士。
  • 基金资助:
    国家科技支撑计划基金资助项目(2012BAH74F02);上海市国际科技合作基金资助项目(13430710100)。

Abstract: In the study of popularity of film and TV dramas,methods solely based on social network or search data can not reflect the accuracy which needs TV viewers in different times and always has low forecast accuracy.Thus,based on the features in social networks which has significant correlation with the drama on demand quality,microblog data before premiere and search data after the premiere,it uses multiple regression model to forecast the rank of average drama on demand in Video on Demand(VOD) system.Analysis result shows that the method which fuses social network data before premiere with search data three days later after premiere performs better than purely using social network or search data,and it reflects the TV viewers’ needs more precisely.The Spearman correlation coefficient between the prediction rank and real rank are high,nearly 0.82 and 0.89 on YouKu and IQiyi,and this method can be used to help video operators make copyrights purchase decision.

Key words: Video on Demand(VOD) system, TV drama on demand quality ranking, social network, search index, multiple regression

摘要: 现有影视类视频流行度预测方法仅使用社交网络数据或搜索引擎数据,不能较好体现不同时间段的观众需求且预测准确率较低。针对该问题,以预测视频点播系统中电视剧未来一段时间内的点播量排名为目标,基于社交网络中与电视剧点播量显著相关的特征、首播前的新浪 微博数据以及首播后的百度搜索数据,利用多元线性回归模型进行点播量排名预测。实验结果表明,与单纯使用社交网络或搜索引擎数据的预测方法相比,该方法得到的预测排名与真实排名之间的斯皮尔曼相关系数更高,对于优酷和爱奇艺2014年新上线的电视剧分别达到 0.82和0.89,更真实地反映了观众需求,并能辅助视频运营商进行版权购买决策。

关键词: 视频点播系统, 电视剧点播量排名, 社交网络, 搜索指数, 多元回归

CLC Number: