作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (12): 290-298. doi: 10.19678/j.issn.1000-3428.0055891

• 开发研究与工程应用 • 上一篇    下一篇

基于数据集成的随机森林算法

谢坤, 容钰添, 胡奉平, 陈桓, 姚小龙   

  1. 顺丰科技有限公司 大数据与区块链研发中心, 广东 深圳 518000
  • 收稿日期:2019-09-03 修回日期:2019-11-08 发布日期:2019-12-09
  • 作者简介:谢坤(1992-),男,博士,主研方向为机器学习、运筹学、自然语言处理;容钰添、胡奉平,硕士;陈桓,博士;姚小龙(通信作者),硕士。
  • 基金资助:
    深圳市发展改革委战略性新兴产业发展专项"基于人工智能技术的智慧物流系统研发与产业化项目"。

Random Forest Algorithm Based on Data Integration

XIE Kun, RONG Yutian, HU Fengping, CHEN Huan, YAO Xiaolong   

  1. Research and Development Center of Big Data and Blockchain, SF Technology Co., Ltd., Shenzhen, Guangdong 518000, China
  • Received:2019-09-03 Revised:2019-11-08 Published:2019-12-09

摘要: 用于销售预测的历史数据存在稀疏性与波动性等特点,当预测周期较长时,传统统计学或者机器学习领域预测算法的预测效果较差。为此,利用随机森林的集成思想与训练数据集的随机分割重组,提出一种基于数据集成的随机森林算法。该算法通过随机重组将原始的一维预测变量重组为高维变量,并将输出求和值作为最终预测值。实验结果表明,与ARIMA、RF、GBDT等传统算法相比,该算法在实际数据集上的预测效果取得显著提高。同时,拓展实验表明数据集成还可应用在ARIMA算法上,使预测准确率提高约3%。

关键词: 销量预测, 时间序列预测, 机器学习, 数据集成, 随机森林

Abstract: The historical data used for sales forecasting has the characteristics of sparseness and volatility,the traditional statistical or machine learning prediction algorithms for prediction perform poorly when the prediction cycle is long.Therefore,based on the integration idea of Random Forest(RF) and the random partition and reorganization of training data set,this paper proposes a RF algorithm based on data integration.The algorithm reconstructs the original one-dimensional prediction variable into high-dimensional variables by random recombination,and takes the output summation value as the final prediction value.The experimental results show that compared with traditional algorithms including ARIMA,RF and GBDT,the prediction performance of this algorithm on the actual data set has been significantly improved.At the same time,extended experiments show that the data integration can also be applied to ARIMA algorithm,and the prediction accuracy of the algorithm is improved by about 3%.

Key words: sales forecasting, time series prediction, machine learning, data integration, Random Forest(RF)

中图分类号: