作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (12): 60-66,72. doi: 10.19678/j.issn.1000-3428.0056255

• 人工智能与模式识别 • 上一篇    下一篇

一种多模型集成的网络论坛流量预测模型

廖含月1,2, 曾剑平1,2, 吴承荣1,2   

  1. 1. 复旦大学 计算机科学技术学院, 上海 200433;
    2. 教育部网络信息安全审计与监控工程研究中心, 上海 200433
  • 收稿日期:2019-10-11 修回日期:2019-12-12 发布日期:2019-12-20
  • 作者简介:廖含月(1996-),女,硕士研究生,主研方向为机器学习、大数据安全;曾剑平(通信作者)、吴承荣,副教授。
  • 基金资助:
    国家重点研发计划"网络空间安全"重点专项(2017YFB0803203)。

A Model for Online Forum Traffic Prediction Integrated with Multiple Models

LIAO Hanyue1,2, ZENG Jianping1,2, WU Chengrong1,2   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China;
    2. Engineering Research Center of Cyber Security Auditing and Monitoring, Ministry of Education, Shanghai 200433, China
  • Received:2019-10-11 Revised:2019-12-12 Published:2019-12-20

摘要: 论坛流量预测对网络规划、舆情管理等任务具有重要意义,针对线性预测模型无法预测非线性关系、非线性预测模型的特征工程过于复杂的问题,利用历史时间序列作为特征,建立一种基于不同算法的集成模型以预测论坛发帖量。运用差分自回归移动平均、长短期记忆神经网络、Prophet以及梯度提升决策树4种模型分别对时间序列进行预测,参照加权投票法的思想,各模型投票选出时间序列单位下密度较大的预测值区间,依据各模型预测值所处区间的密度大小对各预测值进行权重分配,然后通过加权平均得到最终的预测结果。实验结果表明,与算术平均模型、基于均方根误差的加权平均模型相比,该模型预测结果的RMSE值以及相对误差值更小。

关键词: 时间序列预测, 集成学习, 论坛流量, 组合预测模型, 集成预测

Abstract: Forum traffic prediction is of great significance to network planning and public opinion management,but existing linear prediction models fail to predict nonlinear relationships and the feature engineering of nonlinear prediction models is too complicated.To address the problems,this paper uses the historical time series as the feature to establish a model combining different algorithms to predict the number of forum posts.The four models of differential auto regressive moving average,long and short term memory neural network,Prophet and gradient lifting decision tree are used to predict the time series respectively.Then based on the idea of the weighted voting method,each model votes to select a dense interval of predicted values within each unit of time series.According to the density of the predicted values of each model,the weight of each prediction value is assigned,and then the final prediction result is obtained by weighted average.The experimental results show that compared with the arithmetic average model and the weighted average model based on Root Mean Square Error(RMSE),the proposed model reduces the values of RMSE and relative error of the prediction result.

Key words: time series prediction, integrated learning, forum traffic, hybrid prediction model, integrated prediction

中图分类号: