作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种基于分割K-最近邻算法的传染病预测方法

相晓敏,顾君忠,王永明   

  1. (华东师范大学计算机科学技术系,上海 200241)
  • 收稿日期:2015-01-08 出版日期:2016-01-15 发布日期:2016-01-15
  • 作者简介:相晓敏(1989-),女,硕士研究生,主研方向为数据挖掘;顾君忠,教授;王永明,博士研究生。
  • 基金资助:
    上海市国际科技合作基金资助项目(13430710100);上海市科委科技创新行动计划基金资助项目(13511506201)。 

An Infectious Disease Prediction Method Based on Division K-nearest Neighbor Algorithm

XIANG Xiaomin,GU Junzhong,WANG Yongming   

  1. (Department of Computer Science and Technology,East China Normal University,Shanghai 200241,China)
  • Received:2015-01-08 Online:2016-01-15 Published:2016-01-15

摘要: 传染病预测是时间序列预测中的一个重要应用领域,针对常用传染病预测算法准确率较低的问题,提出一种基于数据分割的最近邻算法,对相同月份的数据进行相似度计算。将传染病数据按照月份进行分割,得到不同年份、相同月份的时间序列数据,运用K-最近邻(KNN)的方 法对时间序列数据进行相似度计算,得出最相似的时间序列的预测序列预测值。利用上海市疾病预防控制中心腹泻数据进行实验,结果表明,该方法能够充分考虑到月份对腹泻人数的影响,与改进前的基于KNN的连续时间序列预测算法相比,平均绝对误差值、平均百分比误差 值、均方根误差值分别降低38.52,0.07,47.86,与传统的预测方法ARIMA相比,平均绝对误差、平均百分比误差值、均方根误差值分别降低23.04,0.07,28.12。

关键词: 预测, 传染病预测, K-最近邻算法, 时间序列, 相似性计算

Abstract: Infectious disease prediction is an important field in time series prediction,owing to the problem of lower disease forecast precision,a method that calculates the similarity of the data in the same month by K-nearest Neighbor(KNN) algorithm based on data division is proposed.This method divides diarrhea data by month,and gets the same month time series data in different years.It uses KNN method to calculate similarity of time series,and gets the most similar time series.Forecast sequence is obtained by the most similar time series.An experiment is done based on the data of diarrhea in Shanghai,experimental result shows that the method can fully take an impact on the number of diarrhea of season into account,compared with the continuous time series prediction algorithm based on KNN,MAE is less than 38.52,MPE is less than 0.07 and RMSE is less than 47.86,and compared with the traditional forecasting methods of ARIMA,MAE is less than 23.04,MPE is less than 0.07 and RMSE is less than 28.12.

Key words: prediction, infectious disease prediction, K-nearest Neighbor(KNN) algorithm, time series, similarity calculation

中图分类号: