作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 298-305. doi: 10.19678/j.issn.1000-3428.0067504

• 开发研究与工程应用 • 上一篇    下一篇

基于时空长短时记忆神经网络的地基云图预测算法

吴现1, 吐松江·卡日1,*(), 王海龙2, 马小晶1, 李振恩1, 邵罗1   

  1. 1. 新疆大学电气工程学院, 新疆 乌鲁木齐 830049
    2. 北京智盟信通科技有限公司, 北京 100053
  • 收稿日期:2023-04-26 出版日期:2024-03-15 发布日期:2023-07-05
  • 通讯作者: 吐松江·卡日
  • 基金资助:
    国家自然科学基金(52067021); 新疆维吾尔自治区自然科学基金面上项目(2022D01C35)

Ground-based Cloud Map Prediction Algorithm Based on Spatio-temporal Long Short-term Memory Neural Network

Xian WU1, Kari TUSONGJIANG1,*(), Hailong WANG2, Xiaojing MA1, Zhen'en LI1, Luo SHAO1   

  1. 1. School of Electrical Engineering, Xinjiang University, Urumqi 830049, Xinjiang, China
    2. Beijing Zhimeng Information and Telecommunication Tech Co., Ltd., Beijing 100053, China
  • Received:2023-04-26 Online:2024-03-15 Published:2023-07-05
  • Contact: Kari TUSONGJIANG

摘要:

针对传统云运动轨迹预测方法存在的预测精度差、空间结构细节信息丢失等问题,提出一种基于时空长短时记忆(ST-LSTM)神经网络的地基云图预测模型。首先使用卷积编码网络提取输入视频流的高维图像特征;然后在特征提取模型中对图像潜在信息进行多分支获取,一部分使用ST-LSTM神经网络提取不同帧之间的时空特征,另一部分将图像序列进行分解,并通过基于门控机制的记忆融合网络来获取分解后图像中的结构细节信息;最后将得到的分支特征进行组合后经过解码网络输出最终的预测视频流。在地基云图、Moving MNIST和Human 3.6M数据集上的实验结果表明,在图像预测准确率、结构细节信息保留效果以及人眼主观感受上,该预测模型均优于对比模型。与基准模型TaylorNet相比,所提模型在Moving MNIST数据集上均方误差指标和平均绝对误差指标分别降低15.7%和11.8%,在地基云图数据集上,其结构相似性指标与峰值信噪比指标分别提升1%和3.2%,且生成的视频流数据更为清晰,能够更准确地描述云层未来的运动状况,从而更可靠地预测光伏电站未来的输出功率。

关键词: 深度学习, 视频预测, 地基云图, 麦克劳林展开, 时空长短时记忆神经网络

Abstract:

A ground-based cloud map prediction algorithm based on a Spatio-Temporal Long Short-Term Memory (ST-LSTM) neural network is proposed to address the problems of poor prediction accuracy and the loss of spatial structure details in traditional cloud motion trajectory prediction methods. First, a convolutional coding network is used to extract the high-dimensional image features of the input video stream. Then, multiple branches of potential information are obtained from the image in the feature extraction model. One part uses a ST-LSTM neural network to extract spatiotemporal features between different frames. The other part decomposes the image sequence and passes the decomposed information through a memory fusion network based on a gating mechanism to obtain the structural details in the image. Finally, the obtained branching features are combined. The final predicted video stream is output by a decoding network. Experimental results on the ground-based cloud map, Moving MNIST, and Human 3.6M datasets show that the prediction model outperforms current state-of-the-art models in terms of image prediction accuracy, structural detail information retention, and subjective perception by the human eye. Compared with the benchmark model TaylorNet, its Mean Squared Error (MSE) and Mean Absolute Error (MAE) metrics are reduced by 15.7% and 11.8%, respectively, on the Moving MNIST dataset. The Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) metrics are improved by 1% and 3.2%, respectively, on the ground-based cloud map dataset. Additionally, the generated video stream data is clearer, which helps to describe the future motion of the clouds more accurately. This leads to more reliable predictions of the output power of the photovoltaic power station.

Key words: deep learning, video prediction, ground-based cloud map, Maclaurin expansion, Spatio-Temporal Long Short-Term Memory(ST-LSTM) neural network