基于条件生成对抗网络的深度点过程二次预测

引用本文

卞玮, 李晨龙, 侯红卫. 基于条件生成对抗网络的深度点过程二次预测[J]. 计算机工程, 2022, 48(12), 127-133. DOI: 10.19678/j.issn.1000-3428.0063648.

BIAN Wei, LI Chenlong, HOU Hongwei. Second Prediction of the Deep Point Process Based on Conditional Generative Adversarial Network[J]. Computer Engineering, 2022, 48(12), 127-133. DOI: 10.19678/j.issn.1000-3428.0063648.

基金项目

国家自然科学基金（61901294）；山西省应用基础研究计划项目（201901D211105）

通信作者

李晨龙（通信作者），讲师、博士

作者简介

卞玮（1994—），男，硕士研究生，主研方向为点过程、深度学习;
侯红卫，副教授、博士

文章历史

收稿日期：2021-12-29
修回日期：2022-02-07

Contents Abstract Full text Figures/Tables PDF

基于条件生成对抗网络的深度点过程二次预测

卞玮 , 李晨龙 , 侯红卫

太原理工大学数学学院, 太原 030000

收稿日期：2021-12-29；修回日期：2022-02-07

基金项目：国家自然科学基金（61901294）；山西省应用基础研究计划项目（201901D211105）

作者简介：卞玮（1994—），男，硕士研究生，主研方向为点过程、深度学习; 侯红卫，副教授、博士.

通信作者：李晨龙（通信作者），讲师、博士.

E-mail: lichenlong@tyut.edu.cn

摘要：结合深度神经网络和时序点过程的深度点过程模型在进行时间预测时，会因模型本身系统误差和数值计算精度不足而导致预测值序列中存在较大偏差。为提高预测精度并有效避免模型调优同时降低数值误差，建立一种基于条件生成对抗网络（CGAN）的深度点过程二次预测模型，在深度点过程初次预测值序列的基础上进行二次预测。假设初次预测偏差来自时序点过程分布上的差异，利用CGAN对分布的变换能力来修正初次预测值序列分布为原始时序点过程序列分布，从而降低预测值序列误差。在流程上，将初次预测值序列输入生成器生成伪值序列，将伪值序列与对应的真实值序列输入判别器中判别真假，经过对抗训练得到对初次预测值序列具备修正能力的生成器。同时，为增强CGAN对时序点过程数据的匹配度，在其结构上采用CGAN+LSTM的形式，同时改进损失函数为时序点过程Wasserstein距离的对偶形式及其1-Lipschitz约束。实验结果表明，该模型具有较高的时间预测准确度，二次预测值序列的均方误差相较初次预测值序列平均降低77%以上。

Second Prediction of the Deep Point Process Based on Conditional Generative Adversarial Network

BIAN Wei , LI Chenlong , HOU Hongwei

School of Mathematics, Taiyuan University of Technology, Taiyuan 030000, China

Abstract: The deep point process model that combines a deep neural network and time-series point process is often used for time prediction. However, large deviations in the prediction value series frequently occur because of the systematic error of the model itself and the insufficient accuracy of the numerical calculation. To improve the prediction accuracy, effectively avoid model tuning, and reduce numerical errors, a deep point process secondary prediction model based on a Conditional Generative Adversarial Network(CGAN) is established, and a second prediction is conducted based on the initial prediction value sequence of the deep point process. Based on the assumption that the initial prediction deviation derives from the difference in the process distribution of time-series points, the CGAN, with its ability to transform the distribution, is used to modify the initial prediction value sequence distribution to the original time-series point process sequence distribution, thereby reducing the prediction value sequence error. In this process, the initial predictive value sequence is input into the generator to generate a pseudo-value sequence. The pseudo-value sequence and corresponding real value sequence are then input into the discriminator to determine whether they are true or false. Following confrontation training, a generator that can correct the initial predictive value sequence is obtained. Simultaneously, to enhance the matching degree of the CGAN to the time-series point process data, a CGAN+LSTM structure is adopted, and the loss function is improved such that it becomes a dual form of the Wasserstein distance of the time-series point process and 1-Lipschitz constraint. Experimental results show that the model has a high time-prediction accuracy, and the Mean Square Error(MSE) of the second prediction value series is more than 77% less than that of the first prediction value series.

开放科学(资源服务)标志码(OSID)：

0 概述

时序点过程是建模不规则时间间隔事件序列的概率生成模型，并在现实建模中成为处理这一问题的有效数学模型^[1]。时序点过程将历史事件的依赖关系嵌入到强度函数的表达式中，得出事件发生时间间隔的概率密度函数^[2]，进而进行预测与分析。然而，传统统计建模方法严格限定强度函数形式，从而导致建模能力较弱。近年来，为了解决这一问题，DU等^[3]提出将历史信息嵌入到循环神经网络（RNN）的隐藏状态中，利用深度学习的方法拓宽时序点过程的建模途径。此后，涌现出很多基于RNN的时序点过程模型^[4-5]（本文简称深度点过程），使得强度函数或者时间间隔概率密度函数更加灵活多变^[6-7]。目前，深度点过程在效果上达到甚至超越了传统点过程模型，为时序点过程在实际中的应用提供了更加有效的方案。然而，深度点过程灵活性的支撑是深度神经网络参数化的非线性变换，这造成在求解时间预测值时，时间间隔概率密度函数难以显式表达或者积分不存在解析解的问题，需要通过数值方法近似求解。此外，模型本身的系统误差也同样导致时间上的预测值与真实值之间的偏差较大。这两点原因造成的预测精度不足限制了深度点过程在实际场景中的应用，为深度点过程在实践中的推广带来极大挑战。

时序点过程序列可以理解为一连串相互关联的概率分布下的样本，因此，产生的偏差可以假设为分布上的差异。图像去运动模糊算法使用条件生成对抗网络（CGAN）^[8]去除因抖动、光晕、运动而产生的图像模糊，从而提高图像的清晰度^[9-10]。受此思路启发，本文将深度点过程预测值序列与真实值序列的偏差视为由模型及数值方法带来的“模糊”，即预测值序列和真实值序列在分布上的差异，使用CGAN对预测值序列进行二次预测修正。此外，考虑到点过程序列是一个随机过程而非随机变量^[11]，本文采用时序点过程Wasserstein距离的对偶形式及1-Lipschitz正则项对CGAN进行约束^[12-13]，从而提高时间预测的准确度并降低预测的均方误差。

1 相关工作 1.1 时序点过程

时序点过程是一类特殊的计数过程，如图 1所示，其对时刻$ t $之前的事件数量进行计数，核心是强度函数$ {\lambda }^{\mathrm{*}}\left(t\right) $。若历史事件$ {\mathcal{H}}_{t}=\left\{{t}_{i}|{t}_{i} < t\right\} $已知，对于一个无穷小的时间窗口$ \left[t, t+\mathrm{d}t\right) $，强度函数$ {\lambda }^{\mathrm{*}}\left(t\right) $可视为$ t $时刻事件的发生率，并有如下定义^[2]：

	Download: JPG larger image
图 1 时序点过程示意图 Fig. 1 Schematic diagram of sequence point process

$ \begin{array}{l}{\lambda }^{\mathrm{*}}\left(t\right)=\lambda \left(t|{\mathcal{H}}_{t}\right)=\\ \;\;\;\;\;\;\;\;\;\;\;\; \mathbb{P}\left(\left.\left\{\left[t, t+\mathrm{d}t\right)\mathrm{内}\mathrm{有}\mathrm{一}\mathrm{事}\mathrm{件}\mathrm{发}\mathrm{生}\right\}\right|{\mathcal{H}}_{t}\right)=\\ \;\;\;\;\;\;\;\;\;\;\;\;\mathbb{E}\left(\mathrm{d}N\left(t\right)|{\mathcal{H}}_{t}\right)\end{array} $

(1)

其中：$ N\left(t\right) $表示$ t $时刻之前事件发生的总次数；$ \mathbb{E}\left(\mathrm{d}N\left(t\right)\right|{\mathcal{H}}_{t}) $是给定历史观测值$ {\mathcal{H}}_{t} $的情况下在时间间隔$ \left[t, t+\mathrm{d}t\right) $中发生的平均事件数。事件发生时间间隔概率密度函数为：

$ {f}^{\mathrm{*}}\left(t\right)={\lambda }^{\mathrm{*}}\left(t\right)\mathrm{e}\mathrm{x}\mathrm{p}\left(-{\int }_{{t}^{'}}^{t}{\lambda }^{\mathrm{*}}\left(\tau \right)\mathrm{d}\tau \right) $

(2)

其中：$ {t}^{'} $为$ t $时刻之前最后一个事件发生的时刻。

1.2 深度点过程及预测瓶颈

深度点过程是以RNN嵌入式表达历史事件序列发生的潜在信息，并将强度函数或者事件发生时间间隔概率密度函数视为历史潜在信息非线性函数的点过程建模方法。相较于传统统计学上的点过程建模方法，深度点过程利用了神经网络强大的拟合能力，但同时带来大量的网络参数和一系列的非线性变化，对时间预测造成了困难，需要用数值方法来解决。例如，循环标记时序点过程（RMTPP）^[3]将条件强度函数形式化为Gompertz分布：

$ {\lambda }^{\mathrm{*}}\left(t\right)=\mathrm{e}\mathrm{x}\mathrm{p}\left({\boldsymbol{v}}^{t}{\boldsymbol{h}}_{{t}^{'}}+{w}^{t}\left(t-{t}^{'}\right)+{b}^{t}\right) $

(3)

其中：$ {\boldsymbol{v}}^{t} $、$ {\boldsymbol{h}}_{{t}^{'}} $、$ {w}^{t} $、$ {b}^{t} $均由神经网络得出。进而，事件发生时间间隔概率密度函数为：

$ \begin{array}{l}{f}^{\mathrm{*}}\left(t\right)={\lambda }^{\mathrm{*}}\left(t\right)\mathrm{e}\mathrm{x}\mathrm{p}\left(-{\int }_{{t}^{'}}^{t}{\lambda }^{\mathrm{*}}\left(\tau \right)\mathrm{d}\tau \right)=\\ \;\;\;\;\;\;\;\;\;\;\;\;\mathrm{e}\mathrm{x}\mathrm{p}\left\{{\boldsymbol{v}}^{t}{\boldsymbol{h}}_{{t}^{'}}+{w}^{t}(t-{t}^{'})+{b}^{t}+\frac{1}{w}\mathrm{e}\mathrm{x}\mathrm{p}({\boldsymbol{v}}^{t}{\boldsymbol{h}}_{{t}^{'}}+{b}^{t})-\right.\\ \;\;\;\;\;\;\;\;\;\;\;\;\left.\frac{1}{w}\mathrm{e}\mathrm{x}\mathrm{p}({\boldsymbol{v}}^{t}{\boldsymbol{h}}_{{t}^{'}}+{w}^{t}(t-{t}^{'})+{b}^{t})\right\}\end{array} $

(4)

采用期望计算预测值$ \widehat{t} $为：

$ \widehat{t}={\int }_{{t}^{'}}^{\mathrm{\infty }}t{f}^{\mathrm{*}}\left(t\right)\mathrm{d}t $

(5)

由于式（5）中的积分通常不具有解析解，需要使用数值方法近似求解，因此产生了偏差。类似地，全神经网络时序点过程（FullyNN）^[14]通过神经网络直接输出强度函数的积分：

$ \mathit{\Phi} \left(t\right)={\int }_{{t}^{'}}^{t}{\lambda }^{\mathrm{*}}\left(\tau \right)\mathrm{d}\tau $

(6)

但是，强度函数$ {\lambda }^{\mathrm{*}}\left(\tau \right) $只能通过$ \mathit{\Phi} \left(t\right) $求导得出，无法得到显示表达式，因此，使用中位数$ {t}^{\mathrm{*}} $做预测值。然而，$ {t}^{\mathrm{*}} $需要二分法求方程$ \mathit{\Phi} ({t}^{\mathrm{*}}-{t}^{'})=\mathrm{l}\mathrm{o}{\mathrm{g}}_{\mathrm{a}}2 $的根得出，无法获得解析的最优解，存在数值误差。同时，由于模型设计上的缺陷，这两种模型都存在系统误差，即使巧妙地避开数值误差也会存在系统误差。例如，混合对数正态时序点过程（LogNormMix）^[15]使用混合对数正态分布拟合时间间隔的概率密度函数：

$ {f}^{\mathrm{*}}\left(t\right)=\sum\limits _{k=1}^{K}{w}_{k}\frac{1}{t{s}_{k}\sqrt{2\mathrm{\pi }}}\mathrm{e}\mathrm{x}\mathrm{p}\left(-\left(\frac{{\left(\mathrm{l}\mathrm{o}{\mathrm{g}}_{\mathrm{a}}t-{\mu }_{k}\right)}^{2}}{2{s}_{k}^{2}}\right)\right) $

(7)

其中：$ K $为分布个数；$ {w}_{k} $、$ {s}_{k}^{2} $、$ {\mu }_{k} $均为神经网络的输出。然后，以期望作为下一个时刻$ \widehat{t} $的预测值：

$ \widehat{t}={\mathrm{e}}^{{\mu }_{k}+\frac{{s}_{k}^{2}}{2}} $

(8)

通过上述过程后不存在数值误差，但由于$ {s}_{k}^{2} $大小不可控，容易出现极端值，对预测精度有很大影响，从而导致系统误差。

1.3 条件生成对抗网络

SHAHAM等^[8]提出的CGAN是一种带条件约束的概率生成模型。CGAN引入条件变量$ \boldsymbol{y} $作为辅助信息（类别标签、其他模态数据等），指导生成器的生成，其损失函数比生成对抗网络（GAN）^[16]多加入额外的条件信息$ \boldsymbol{y} $，如下：

$ \begin{array}{l}\underset{\mathrm{G}}{\mathrm{m}\mathrm{i}\mathrm{n}}\underset{\mathrm{D}}{\mathrm{m}\mathrm{a}\mathrm{x}}V(\mathrm{D}, \mathrm{G})=\\ {\mathbb{E}}_{\boldsymbol{x}\sim{P}_{\mathrm{r}\mathrm{e}\mathrm{a}\mathrm{l}}\left(\boldsymbol{x}\right)}\left[\mathrm{l}\mathrm{o}{\mathrm{g}}_{\mathrm{a}}\mathrm{D}\right(\boldsymbol{x}\left|\boldsymbol{y}\right)]+{\mathbb{E}}_{\boldsymbol{z}\sim{P}_{\boldsymbol{z}}\left(\boldsymbol{z}\right)}[\mathrm{l}\mathrm{o}{\mathrm{g}}_{\mathrm{a}}\mathrm{D}\left(\mathrm{G}\right(\boldsymbol{z}\left|\boldsymbol{y}\right)\left)\right]\end{array} $

(9)

在训练时，WGAN^[17]采用分布间的Wasserstein距离改进GAN，以减缓GAN训练不稳定的问题。

1.4 图像去运动模糊

图像去运动模糊的目标是将因抖动、光晕、运动而产生的模糊图像恢复成清晰图像^[9-10]，其模型定义为：

$ {\boldsymbol{I}}_{B}=\boldsymbol{K}\otimes {\boldsymbol{I}}_{S}+\boldsymbol{{\rm B}} $

(10)

其中：$ {\boldsymbol{I}}_{B} $、$ {\boldsymbol{I}}_{S} $和$ \boldsymbol{K} $分别代表向量化的模糊图像、原始清晰图像和模糊核；$ \boldsymbol{B} $代表噪声；$ \otimes $代表卷积运算。图像去运动模糊过程是式（10）的逆操作，即从模糊图像$ {\boldsymbol{I}}_{B} $中还原出清晰图像$ {\boldsymbol{I}}_{S} $。由于CGAN拥有概率生成模型的特性，能够对图像分布进行还原变换，因此通常被用来实现上述逆操作。

2 深度点过程二次预测模型 2.1 二次预测原理

借鉴图像去运动模糊的原理，将深度点过程模型由于数值积分与模型本身造成的预测值序列误差视为与真实值序列分布上的差异^[18]，然后通过条件生成对抗网络还原初次预测值序列为真实值序列。

设输入的初次预测值序列为$ \boldsymbol{\tau }={\left\{{\tau }_{i}\right\}}_{i=1}^{l} $，对应的真实值序列为$ \boldsymbol{t}={\left\{{t}_{i}\right\}}_{i=1}^{l} $，二次预测模型假设为：

$ \boldsymbol{\tau }=\boldsymbol{K}·\boldsymbol{t}+\boldsymbol{N} $

(11)

其中：$ \boldsymbol{K} $为变换矩阵；$ \boldsymbol{N}={\left\{{n}_{i}\right\}}_{i=1}^{l} $为噪声序列。

由于原始WGAN假设处理的数据为随机变量，这与时序点过程（属于随机过程）在样本空间上存在差异。对此，采用时序点过程之间的Wasserstein距离来约束生成对抗网络的训练以保证理论上的正确性，其具体形式为^[13]：

$ {\left|\boldsymbol{x}-\boldsymbol{y}\right|}_{\mathrm{*}}\triangleq \sum\limits _{i=1}^{l}\left|{x}_{i}-{y}_{i}\right| $

(12)

其中：$ \boldsymbol{x} $、$ \boldsymbol{y} $为2个点过程的实例。

条件生成对抗网络中生成器的功能是逆向还原初次预测值序列为对应的真实值序列，也称伪值序列。判别器的目标是最大化生成器还原出伪值序列的时序点过程分布与对应真实值序列的时序点过程分布之间的Wasserstein距离（初次预测值序列与真实值序列的对应关系为额外的辅助信息）：

$ W({P}_{t}, {P}_{g})=\underset{φ \in φ ({P}_{t}, {P}_{g})}{\mathrm{i}\mathrm{n}\mathrm{f}}{E}_{(\boldsymbol{t}, \boldsymbol{\tau })}\left[\left|\boldsymbol{t}-g\left(\boldsymbol{\tau }\right)\right|\right] $

(13)

其中：$ {P}_{t} $、$ {P}_{g} $分布分别为真实值时序点过程分布、伪值时序点过程分布；$ φ $为$ {P}_{t}\mathrm{、}{P}_{g} $的联合分布集合；$ g $为CGAN生成器的函数表示。为了便于计算处理，将式（13）转化为对偶形式^[13]：

$ \begin{array}{l}{W}^{\mathrm{*}}({P}_{t}, {P}_{g})=\underset{{‖f‖}_{L}\le 1}{\mathrm{s}\mathrm{u}\mathrm{p}}{E}_{\boldsymbol{t}\sim{P}_{t}}\left(f\right(\boldsymbol{t}\left)\right)-{E}_{g\left(\boldsymbol{\tau }\right)\sim{P}_{g}}f\left(g\right(\boldsymbol{\tau }\left)\right)\\ {‖f‖}_{L}\triangleq \underset{\boldsymbol{t}\ne g\left(\boldsymbol{\tau }\right)}{\mathrm{s}\mathrm{u}\mathrm{p}}\frac{\left|f\left(\boldsymbol{t}\right)-f\left(g\right(\boldsymbol{\tau }\left)\right)\right|}{{\left|\boldsymbol{t}-g\left(\boldsymbol{\tau }\right)\right|}_{\mathrm{*}}}\le 1\\ {\left|\boldsymbol{t}-g\left(\boldsymbol{\tau }\right)\right|}_{\mathrm{*}}\triangleq \sum\limits _{i=1}^{l}\left|{t}_{i}-g\left({\tau }_{i}\right)\right|\end{array} $

(14)

其中：$ f $为CGAN的判别器函数；$ {\left|·\right|}_{\mathrm{*}} $为时序点过程之间的距离；$ {‖f‖}_{L}\le 1 $为1-Lipschitz约束^[7]。

2.2 二次预测算法及训练流程

首先训练深度点过程模型得出初次预测值序列，接着对初次预测值序列进行修正^[17]。二次预测流程分为3个步骤：

步骤1 将初次预测值序列$ \boldsymbol{\tau } $输入条件生成对抗网络的生成器$ g $中得出伪值序列$ g\left(\boldsymbol{\tau }\right) $，用$ \boldsymbol{\zeta }=g\left(\boldsymbol{\tau }\right) $。

步骤2 将伪值序列$ \boldsymbol{\zeta } $及对应的真实值序列$ \boldsymbol{t} $输入判别器$ f $中分别求出得分$ f\left(\boldsymbol{\zeta }\right) $、$ f\left(\boldsymbol{t}\right) $，以判别真假。

步骤3 通过对抗训练得出具备还原能力的生成器。

算法训练过程中生成器$ g $的损失函数为：

$ \underset{w}{\mathrm{m}\mathrm{a}\mathrm{x}}\;{L}_{g}=f\left(\boldsymbol{\zeta }\right) $

(15)

其中：$ w $为生成器$ g $的参数。

判别器$ d $的损失函数为：

$ \underset{\mathcal{V}}{\mathrm{m}\mathrm{a}\mathrm{x}}\;{L}_{d}=f\left(\boldsymbol{t}\right)-f\left(\boldsymbol{\zeta }\right)-\left|\frac{\sum\limits _{i=1}^{l}\left|f\left({t}_{i}\right)-f\left({\zeta }_{i}\right)\right|}{\sum\limits _{i=1}^{l}\left|{t}_{i}-{\zeta }_{i}\right|}-1\right| $

(16)

其中：$ \mathcal{V} $为判别器$ d $的参数。

2.3 条件生成对抗网络结构

二次预测模型中使用的条件生成对抗网络在结构上采用CGAN+LSTM^[19-20]的形式，其中，LSTM（长短期记忆网络）是RNN的一种常用变体。本文条件生成对抗网络结合了CGAN的概率生成能力和LSTM的嵌入历史潜在信息的能力，以达到准确还原初次预测值序列点过程分布并得到修正后二次预测值序列的目的^[21]。在网络结构上，按作用可分为历史潜在信息读取、空间变换以及生成器/判别器输出层3个模块，如图 2所示。

	Download: JPG larger image
图 2 条件生成对抗网络结构 Fig. 2 CGAN structure

2.3.1 历史潜在信息读取

在生成器和判别器中都需要将输入的序列（生成器为初次预测值序列，判别器为伪值序列或者真实值序列）中包含的历史潜在信息嵌入到神经网络中^[22]。在生成器和判别器中，先使用LSTM逐时间步读取序列中的数据，提取历史潜在信息。在保证分布转换前后时间连续的前提下，得到固定特征维度为$ d(d > 1) $的向量表示：

$ {\boldsymbol{v}}_{0}=\mathrm{L}\mathrm{S}\mathrm{T}\mathrm{M}\left(\boldsymbol{x}\right) $

(17)

其中：$ \boldsymbol{x}\in {\mathbb{R}}^{l\times 1} $为生成器或判别器的输入；$ {\boldsymbol{v}}_{0}\in {\mathbb{R}}^{l\times d} $为对应的历史潜在信息向量；$ \mathrm{L}\mathrm{S}\mathrm{T}\mathrm{M} $表示LSTM网络。

2.3.2 空间变换

在生成器中，通过LSTM取得历史潜在信息向量后，需要使网络具备调整读取到的历史潜在信息为真实历史潜在信息的能力^[23]，网络中借助全连接层的空间变换和整合信息的能力来实现这一目的。

首先，将$ {\boldsymbol{v}}_{0} $通过一个全连接层，变换潜在历史信息向量的空间维度$ d\to h(h > d) $，从而到达更高的维度空间：

$ {\boldsymbol{v}}_{1}=\mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{1}\left({\boldsymbol{v}}_{0}\right) $

(18)

其中：$ \mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{1} $表示全连接层；$ {\boldsymbol{v}}_{1}\in {\mathbb{R}}^{l\times h} $。

变换空间之后，为了整合高维度空间带来的信息，使用2个连续降低特征维度的全连接层$ h\to k(k < h)\to n(n < k) $，得到调整后的历史潜在信息^[24]：

$ \begin{array}{l}{\boldsymbol{v}}_{2}=\mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{2}\left({\boldsymbol{v}}_{1}\right)\\ {\boldsymbol{v}}_{3}=\mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{3}\left({\boldsymbol{v}}_{2}\right)\end{array} $

(19)

其中：$ \mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{2} $、$ \mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{3} $表示2个全连接层；$ {\boldsymbol{v}}_{2}\in {\mathbb{R}}^{l\times k} $；$ {\boldsymbol{v}}_{3}\in {\mathbb{R}}^{l\times n} $。

2.3.3 生成器和判别器的输出层

生成器的功能是得到一组序列长度与输入序列相同并且值大于零的时间序列，因此，在结构上首先采用一个全连接层将调整后的历史潜在信息整合为维度是$ l\times 1 $的向量，然后经过Sigmoid函数将向量值的范围约束为大于零，最后输出一组与输入序列同时间步长的伪值序列。以上过程形式上可以表示为：

$ \begin{array}{l}{\boldsymbol{v}}_{4}=\mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{4}\left({\boldsymbol{v}}_{g}\right)\\ \boldsymbol{\varsigma }=\mathrm{S}\mathrm{i}\mathrm{g}\mathrm{m}\mathrm{o}\mathrm{i}\mathrm{d}\left({\boldsymbol{v}}_{4}\right)\end{array} $

(20)

其中：$ {\boldsymbol{v}}_{g}\in {\mathbb{R}}^{l\times n} $表示生成器经过LSTM及信息空间变换后的向量；$ {\boldsymbol{v}}_{g}\in {\mathbb{R}}^{l\times n} $；$ \boldsymbol{\varsigma }\in {\mathbb{R}}^{l\times 1} $；$ \mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{4} $表示全连接层；$ \mathrm{S}\mathrm{i}\mathrm{g}\mathrm{m}\mathrm{o}\mathrm{i}\mathrm{d} $表示Sigmoid函数。

判别器的功能是输出序列真假评分，评分越高说明输入序列为真的可能性越大。在网络结构上，首先将伪值序列或者真实值序列的历史潜在信息向量在时间步维度上相加，然后经过一个全连接层整合信息，最后通过Sigmoid函数输出大于零的评分。以上过程形式上可以表示为：

$ \begin{array}{l}{\boldsymbol{v}}_{5}=\mathrm{s}\mathrm{u}{\mathrm{m}}_{\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}}\left({\boldsymbol{v}}_{d}\right)\\ {\boldsymbol{v}}_{6}=\mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{5}\left({\boldsymbol{v}}_{5}\right)\\ \mathrm{S}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e}=\mathrm{S}\mathrm{i}\mathrm{g}\mathrm{m}\mathrm{o}\mathrm{i}\mathrm{d}\left({\boldsymbol{v}}_{6}\right)\end{array} $

(21)

其中：$ {\boldsymbol{v}}_{d}\in {\mathbb{R}}^{l\times n} $表示判别器经过LSTM及信息空间变换后的向量；$ {\boldsymbol{v}}_{5}\in \mathbb{R} $^1×n；$ {\boldsymbol{v}}_{6}\in \mathbb{R} $^1×n；$ \mathrm{S}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e}\in \mathbb{R} $^1×1；$ \mathrm{D}\mathrm{e}\mathrm{n}\mathrm{s}{\mathrm{e}}_{5} $表示全连接层。如图 2所示，除最后一层外，结合层归一化操作（LN）^[25]与Dense层来缓解过拟合问题并加速训练。

3 实验结果与分析 3.1 数据集

为了检验二次预测方法的有效性，本文分别选取5个仿真数据集和4个真实数据集进行实验，数据均以Python字典形式储存在.pickle文件中。4种用于仿真实验的数据参数设置如下：

1）齐次泊松过程（Poisson）。齐次泊松过程的条件强度函数形式为$ {\lambda }^{\mathrm{*}}\left(t\right)=1 $。

2）更新过程（Renewal）。更新过程的时间间隔服从对数正态概率密度函数：

$ p\left(\tau \right)=\frac{1}{\tau \sigma \sqrt{2\mathrm{\pi }}}{\mathrm{e}}^{-\frac{{(\mathrm{l}\mathrm{n}\;\tau -\mu )}^{2}}{2{\sigma }^{2}}} $

(22)

参数设置为$ \mu =1.0 $，$ \sigma =6.0 $。

3）自校正过程（Self-correcting）。自校正过程的条件强度函数形式为：

$ {\lambda }^{\mathrm{*}}\left(t\right)=\mathrm{e}\mathrm{x}\mathrm{p}\left(t-\sum\limits _{{t}_{i} < t}1\right) $

(23)

4）霍克斯过程（Hawkes）。霍克斯过程的条件强度函数形式为：

$ {\lambda }^{\mathrm{*}}\left(t\right)=\mu +\sum\limits _{{t}_{i} < t}\sum\limits _{j=1}^{M}{\alpha }_{j}{\beta }_{j}\mathrm{e}\mathrm{x}\mathrm{p}(-{\beta }_{j}(t-{t}_{i}\left)\right) $

(24)

对于霍克斯过程，分别设置两组参数：

$ （1） M=1 \text{，} \mu =0.02 \text{，} {\alpha }_{1}=0.8 \text{，} {\beta }_{1}=1 。$

$ （2） M=2 \text{，} \mu =0.02 \text{，} {\alpha }_{1}=0.4 \text{，} {\beta }_{1}=1 \text{，} {\alpha }_{2}=0.4 \text{，} {\beta }_{2}=20 。$

真实数据集采用饭馆评论数据集Yelp toronto、网络慕课交互数据集Mooc、重症监护医学数据集Mimic和维基百科修改数据集Wikipedia。真实数据集描述如下：

1）Yelp toronto数据集来自对多伦多300家最受欢迎的餐厅的评论序列，每条记录都表示某个餐厅的顾客光顾的时间序列。

2）Wikipedia公共数据集是在维基百科编辑情况序列，其选择一个月内编辑次数最多的1 000个页面作为研究对象，生成157 474条用户交互数据。

3）Mooc数据集是哈佛大学和麻省理工学院联合发布的公开数据集，包含7 047名学生在Mooc在线课程中的97类互动信息。

4）Mimic数据集是由贝斯以色列迪康医学中心（BIDMC）提供的重症监护医疗信息公开数据集，其记录了4万多名患者的75类诊断治疗数据，并选取至少出现3次的病人作为研究对象。

3.2 数据基准模型和实验设置

本文选取3种被学术界认可的深度点过程模型，分别为循环标记时序点过程（RMTPP）、全神经网络时序点过程（FullyNN）、混合对数正态时序点过程（LogNormMix），以产生初步的预测数据。其中，MTPP、FullyNN用于探究数值误差和系统误差同时存在时的偏差，LogNormMix用于探究由于系统误差而产生的偏差。此外，选取均方误差（RMSE）作为实验的度量指标，并对比二次预测前后的差异来验证本文方法的有效性。

模型采用Adam优化方法动态调整学习率，避免模型参数陷入次优解。优化方法的初始学习率设置为$ \alpha =1\mathrm{e}-4 $，一阶和二阶指数衰减率分别设置为$ {\beta }_{1}=0.5 $，$ {\beta }_{2}=0.9 $。模型的训练批量设置为64，训练100次循环并应用提前停止的方法，在每次循环训练过程中，训练5/10次判别器后训练1次生成器。

3.3 结果分析

为了从客观上反映模型的二次修正效果，本文对比RMTPP、FullyNN、LogNormMix在经过模型二次预测前后时间预测RMSE值以及其下降的百分比，实验结果如图 3、表 1所示，在表 1中，加粗表示二次预测及标准差。

	Download: JPG larger image
图 3 二次预测后RMSE下降百分比 Fig. 3 Reduction percentage of RMSE after the second prediction

下载CSV 表 1 二次预测前后的RMSE值 Table 1 RMSE values before and after the second prediction

从图 3可以看出，二次预测后RMSE均有不同程度的降低，其中最大降低99.86%，最小降低21.18%，平均降低77.02%，这说明本文所提模型拥有优异的修正能力，同时印证了理论假设的合理性。从RMTPP、FullyNN、LogNormMix之间的差异上看，平均下降百分比由高到低排序为RMTPP（92.48%） > LogNormMix（74.03%） > FullyNN（64.32%），这可解释为：RMTPP采用单峰Gompertz分布拟合强度函数，系统误差高，且求期望数值积分时反常积分不收敛，数值误差较大；LogNormMix采用混合对数正态分布计算期望时受到预测方差极端值的影响而造成预测值波动较大；FullyNN输出条件强度函数的积分且预测求解中位数时采用类似二分法的方法求根，误差较小。因此，3个模型误差大小排序为RMTPP > LogNormMix > FullyNN，更大的误差使得伪样本与真实样本更易区分，判别器在对抗训练中更易达到最优解从而促进生成器的学习。

为检验模型训练抵抗随机因素影响的鲁棒性，对每个数据的每种基线方法都进行10次训练，得出均值和标准差如表 1所示，从中可以看出，标准差的均值在0.1%以内，这表明模型训练稳定，受随机扰动影响较小，鲁棒性较强。

3.4 模型损失

本文所提模型对FullyNN在Hawkes 1数据集上训练时的损失函数如图 4所示，因为其他损失函数收敛情况大同小异，所以不再冗余表述。从图 4可以看出，判别器的损失在最开始的迭代中逐次提高，这时判别器的判别能力逐渐提升导致生成器损失降低。当判别器损失达到第一个峰值后，生成器损失逐渐提高，开始具备一定的生成能力，之后当生成器生成能力提高到第一个峰值后判别器损失又逐步提高。模型在对抗训练约70个迭代周期后，生成器损失维持动态稳定同时判别器损失接近0，达到均衡状态^[26]。较少的迭代次数也表明生成器的能力容易在对抗训练中得到有效提高。

	Download: JPG larger image
图 4 损失函数曲线 Fig. 4 Loss function curves

4 结束语

本文建立一种深度点过程二次预测模型。使用条件生成对抗网络对深度点过程的预测数据进行二次预测，通过逆变换初次预测值序列为真实时序值序列在点过程上的差异，以降低因模型系统误差和数值计算误差而带来的预测偏差。在条件生成对抗网络的损失函数中利用时序点过程Wasserstein距离的对偶形式及其1-Lipschitz约束来训练网络。实验结果表明，本文模型可以稳定有效地降低时间预测偏差，同时条件生成对抗网络的损失容易达到对抗平衡。下一步考虑将其他类型的生成对抗网络及自注意力网络应用于深度点过程二次预测模型中，以提高时间预测效果。

参考文献

[1]	BACRY E, MASTROMATTEO I, MUZY J F. Hawkes processes in finance[J]. Market Microstructure and Liquidity, 2015, 1(1): 1550005. DOI:10.1142/S2382626615500057
[2]	DALEY D J, VERE-JONES D. An introduction to the theory of point processes[M]. 2nd ed. Berlin, Germany: Springer, 2003.
[3]	DU N, DAI H J, TRIVEDI R, et al. Recurrent marked temporal point processes: embedding event history to vector[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2016: 1555-1564.
[4]	XIAO S, YAN J C, FARAJTABAR M, et al. Learning time series associated event sequences with recurrent point process networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(10): 3124-3136. DOI:10.1109/TNNLS.2018.2889776
[5]	MEI H, EISNER J. The neural Hawkes process: a neurally self-modulating multivariate point process[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6754-6764.
[6]	LI S, XIAO S, ZHU S, et al. Learning temporal point processes via reinforcement learning[C]//Proceedings of the 32nd Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 10781-10791.
[7]	UPADHYAY U, DE A, RODRIGUEZ M G. Deep reinforcement learning of marked temporal point processes[C]//Proceedings of the 32nd Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 3168-3178.
[8]	SHAHAM U, YAMADA Y, NEGAHBAN S. Conditional generative adversarial nets[EB/OL]. [2021-11-05]. https://arxiv.org/pdf/1411.1784.pdf.
[9]	KUPYN O, BUDZAN V, MYKHAILYCH M, et al. DeblurGAN: blind motion deblurring using conditional adversarial networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8183-8192.
[10]	KUPYN O, MARTYNIUK T, WU J R, et al. DeblurGAN-v2: deblurring (orders-of-magnitude) faster and better[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 8877-8886.
[11]	AALEN O O, BORGAN O, GJESSING H K. Survival and event history analysis: a process point of view[M]. Berlin, Germany: Springer, 2008.
[12]	芦佳明, 李晨龙, 魏毅强. 自注意力时序点过程生成模型的Wasserstein学习方法[J]. 计算机应用研究, 2022, 39(2): 456-460. LU J M, LI C L, WEI Y Q. Wasserstein learning method for self-attention temporal point process generation model[J]. Application Research of Computers, 2022, 39(2): 456-460. (in Chinese)
[13]	XIAO S, FARAJTABAR M, YE X J, et al. Wasserstein learning of deep generative point process models[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Washington D. C., USA: IEEE Press, 2017: 3250-3259.
[14]	OMI T, UEDA N, AIHARA K. Fully neural network based model for general temporal point processes[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems. Washington D. C., USA: IEEE Press, 2019: 15-26.
[15]	SHCHUR O, BILOLŠ M, GÜNNEMANN S. Intensity-free learning of temporal point processes[C]//Proceedings of the 8th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2020: 144-150.
[16]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[EB/OL]. [2021-11-05]. https://arxiv.org/pdf/1406.2661.pdf.
[17]	ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein GAN[EB/OL]. [2021-11-05]. https://arxiv.org/pdf/1701.07875v1.pdf.
[18]	KOSTELICH E J, SCHREIBER T. Noise reduction in chaotic time-series data: a survey of common methods[J]. Physical Review E, 1993, 48(3): 1752-1763. DOI:10.1103/PhysRevE.48.1752
[19]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI:10.1162/neco.1997.9.8.1735
[20]	邱锡鹏. 神经网络与深度学习[M]. 北京: 机械工业出版社, 2020. QIU X P. Neural networks and deep learning[M]. Beijing: China Machine Press, 2020. (in Chinese)
[21]	XU Z, DU J, WANG J J, et al. Satellite image prediction relying on GAN and LSTM neural networks[C]//Proceedings of 2019 IEEE International Conference on Communications. Washington D. C., USA: IEEE Press, 2019: 1-6.
[22]	TAO X, GAO H Y, SHEN X Y, et al. Scale-recurrent network for deep image deblurring[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8174-8182.
[23]	YOU C, HONG D. Nonlinear blind equalization schemes using complex-valued multilayer feedforward neural networks[J]. IEEE Transactions on Neural Networks, 1998, 9(6): 1442-1455. DOI:10.1109/72.728394
[24]	KWOK T Y, YEUNG D Y. Constructive algorithms for structure learning in feedforward neural networks for regression problems[J]. IEEE Transactions on Neural Networks, 1997, 8(3): 630-645. DOI:10.1109/72.572102
[25]	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. Washington D. C., USA: IEEE Press, 2015: 448-456.
[26]	GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Washington D. C., USA: IEEE Press, 2017: 5769-5779.