基于候客点规划的空闲出租车路线推荐算法

引用本文

陈冬梅, 卜霄菲, 黄河, 等. 基于候客点规划的空闲出租车路线推荐算法[J]. 计算机工程, 2022, 48(2), 297-305. DOI: 10.19678/j.issn.1000-3428.0060727.

CHEN Dongmei, BU Xiaofei, HUANG He, et al. Idle Taxi Route Recommendation Algorithm Based on Waiting Point Planning[J]. Computer Engineering, 2022, 48(2), 297-305. DOI: 10.19678/j.issn.1000-3428.0060727.

基金项目

国家自然科学基金（U20A20182，61873177，62072322）

作者简介

陈冬梅（1994—），女，硕士研究生，主研方向为机器学习、智能交通;
卜霄菲，讲师、博士;
黄河，教授、博士;
杜扬，讲师、博士;
高国举，副教授、博士;
孙玉娥，教授、博士

文章历史

收稿日期：2021-01-28
修回日期：2021-02-25

Contents Abstract Full text Figures/Tables PDF

基于候客点规划的空闲出租车路线推荐算法

陈冬梅¹ , 卜霄菲² , 黄河¹ , 杜扬¹ , 高国举¹ , 孙玉娥³

1. 苏州大学计算机科学与技术学院, 江苏苏州 215006;
2. 沈阳师范大学软件学院, 沈阳 110034;
3. 苏州大学轨道交通学院, 江苏苏州 215137

收稿日期：2021-01-28；修回日期：2021-02-25

基金项目：国家自然科学基金（U20A20182，61873177，62072322）

作者简介：陈冬梅（1994—），女，硕士研究生，主研方向为机器学习、智能交通; 卜霄菲，讲师、博士; 黄河，教授、博士; 杜扬，讲师、博士; 高国举，副教授、博士; 孙玉娥，教授、博士.

E-mail：dmchen@stu.suda.edu.cn

摘要：为空闲出租车司机推荐有效的闲逛路线在提高出租车司机工作效率、减少乘客等待时间以及缓解交通压力方面具有重要作用。现有的研究工作主要集中于为空闲司机推荐完整的驾驶路线，没有考虑到真实路网环境下某些路段的可等待因素，使得推荐的路线因载客概率较低、行驶距离较长而花费成本较高。提出一种基于候客点规划的路线推荐算法，对出租车轨迹数据进行处理，并设计路径匹配算法将每个轨迹点与真实路段一一匹配。通过统计每个路段历史接载信息，并利用一种改进的多层感知机建立可预测时序接载概率的模型，结合路段的可等待因素设计一种最小花费成本的路线推荐算法。在真实数据集上的实验结果表明，与MNP、InExperence、Random算法相比，所提算法花费成本、巡航时间以及巡航路程均明显减少。

Idle Taxi Route Recommendation Algorithm Based on Waiting Point Planning

CHEN Dongmei¹ , BU Xiaofei² , HUANG He¹ , DU Yang¹ , GAO Guoju¹ , SUN Yu’e³

1. School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China;
2. Software College, Shenyang Normal University, Shenyang 110034, China;
3. School of Rail Transportation, Soochow University, Suzhou, Jiangsu 215137, China

Abstract: Effective cruising route recommendation for idle taxi drivers can help to improve drivers' efficiency, reduce passengers' waiting time, and alleviate traffic jam. Most existing studies focus on recommending complete driving routes for idle drivers, and ignore the waiting areas of road segments in real road networks, which causes multiple problems to recommended routes, such as reduced pick-up probability, long driving distance, and increased cost. To address this problem, we propose a route recommendation algorithm based on waiting area planning. The algorithm processes trajectory data of taxis, and employs an algorithm to map each trajectory point to a real road segment. By collecting the historical pick-up probability of each road segment, an improved multi-layer perceptron is used to establish a model for predicting the temporal sequence of pick-up probability. On this basis, we propose a cost-effective route recommendation algorithm that considers the waiting areas of road segments. The experimental results on real-world datasets show that compared with MNP, InExperence, Randombaseline and other algorithms, the proposed algorithm effectively reduces the cruising cost, cruising time, and cruising distance.

开放科学（资源服务）标志码（OSID）：

0 概述

随着定位技术和4G/5G技术的发展，装备全球定位系统（Global Position System，GPS）设备的出租车越来越多^[1-3]。通过对该类设备采集得到的海量轨迹数据进行挖掘分析，可以在改善出租车服务质量方面提供更好的数据支撑，令出租车服务更加舒适、快捷，使出租车成为更多乘客的出行选择^[4]。然而，在面临日益增长的乘客需求和出租车供给不平衡的矛盾时，如何为空闲司机规划路线，以最短时间和最短距离接载到乘客，成为学术界关注的热点问题^[5-7]。

传统的空闲出租车路线推荐主要有2种方式：载客点推荐^[8-10]和载客路线推荐^[5-7]。载客点推荐一般是通过对历史轨迹数据的上下车事件进行聚类从而发现乘客需求较多的载客点，当司机发出请求时，可以推荐司机前往最近的接载点。例如文献[8]通过历史轨迹数据检测停车点，从而并向空闲司机推荐这些停车点以最快的时间接载到乘客。文献[9]通过将底层道路网络聚类成多个路段集群，根据每个集群的特征评估其寻客潜力并进行推荐。然而，该类方法忽略了前往这些载客热点的路线并不唯一的问题，在没有经验知识的情况下，司机如何选择效益最高的路线是一个难以解决的问题。

为解决接载点推荐存在的问题，近几年的研究工作主要集中在为空闲司机推荐接载路线，原因是相比较只推荐接载点而言，直接推荐完整的路径可以为无经验的司机减少闲逛距离，从而减少空闲时的花费成本。其中，文献[5]提出路段的净利润函数，向司机推荐潜在利润最大的路线。文献[6]在文献[5]的工作基础上提出了一个基于蒙特卡洛搜索树思想的路线推荐方法，实现了动态的接载概率。文献[7]提出自适应最短预期闲逛路线的推荐方法，能够根据位置的容量和接载概率推荐路线。

目前存在的路线推荐方法主要是为空闲司机推荐一条完整的驾驶路线，直到司机接载到乘客或者遇到终点路段才停止推荐。但该方式忽略了真实路网环境下某些路段的可等待因素。例如餐饮服务集中地区、大型商场周围、地铁公交车站附近以及学校周边的可停车路段，这些路段都会在特定的时间段（如车站到达、地铁换乘、上课放学等时段）出现大量乘车需求。当司机在某时段经过这些路段时，传统方法依然会继续推荐行驶路线，却没有预判到该路段未来某段时间的乘车需求对司机决策的影响。例如当司机经过地铁站附近的路段时，如果该地铁站在未来1 min刚好到站则会出现大量换乘乘客，此时司机选择停车等待可能会更快接到乘客。相反，如果司机选择继续行驶，则会导致空闲出租车的闲逛距离过长，工作收益降低。相比较传统的路线推荐方法而言，考虑路段等待因素不仅可以帮助司机更快地接载到乘客，而且可以帮助司机降低因闲逛产生的油费损耗，从而进一步减少能源消耗以及废气排放造成的空气污染^[11-12]。此外，通过考虑路段的可等待因素还可以帮助乘客减少因换乘产生的等待时间，在提高搭载双方工作效率的同时，进一步缓解交通拥堵的压力，对提高城市的流动性具有重要意义^[13-14]。

本文针对当前路线推荐方法存在忽略路段可等待因素的问题，提出一种基于候客点规划的空闲出租车路线推荐机制。通过分析出租车行驶过程产生的轨迹数据，将轨迹点和真实路段建立联系。同时，处理出租车行走路段的历史搭载信息，从而构建接载概率预测模型。基于已有数据，以推荐花费成本更小的路线策略为准则，提出一种考虑等待时间的路线推荐算法，并在真实的出租车轨迹数据集上验证该算法的有效性。

1 相关工作

目前关于出租车路线推荐方法主要有2种：向出租车司机推荐接载热点及向出租车司机推荐完整路线。

针对第1种推荐方式，YUAN等^[13-14]通过学习乘客的移动模式和司机接载行为的知识，建立概率模型进行停车点的推荐，使司机能以最短的时间接载到乘客。WANG等^[9]提出一种基于排序的ELM模型，该模型可以根据道路集群的拾取频率来评估所有道路集群的寻客潜力并向出租车司机推荐具有最高寻客潜力的Top-k道路集群。VERMA等^[15]基于强化学习方法，利用蒙特卡抽样建模经验司机的行为，并通过司机的行为偏好捕捉乘客的需求变化，为闲逛司机推荐长期利益最大化的区域。CHEN等^[16]提出利用空闲状态计算接载点的利润并结合DBSCAN聚类算法提取高利润的乘客区域推荐给空闲司机。LI等^[17]基于司机目的点偏好这一因素提出基于目地点概率最大化的推荐算法PROMISE，为空闲司机推荐满足目的区域的一系列接载点。LI等^[18]提出一种改进的ARIMA方法，预测热点区域乘客移动的时空区域，从而挖掘出乘客的移动模式。HWANG等^[19]通过建立ON-OFF模型获取乘客下车位置和下一位乘客上车位置之间的关系，并估计出推荐点的期望利益。然而，这些推荐方法基本都是基于位置点的推荐而非实际的完整路线，而这对司机在找到乘客之前如何减少闲逛时间十分重要。

针对第2种推荐方式，向出租车司机推荐完整路线，QU等^[5]提出一种Cost-Effective的路线推荐系统，该系统首先提出一个净利润函数（乘客收益-出租车花费）来评估每个路段的潜在收益。基于递归树的思想生成寻找乘客的候选路径，并利用净利润函数计算每个候选路径的收益，并将收益最大的路径推荐给空闲出租车司机。GARG等^[6]提出一个基于蒙特卡罗搜索树的寻客策略，目的是令空闲司机与预期乘客之间的闲逛距离最小化。JI等^[11]将最小化空闲时间作为学习目标，提出一个自适应的深度强化学习策略，并进行路线推荐。该工作通过深度网络融合多个实时的内外部特征，并为每个候选路径学习一个评价分数。同时，利用强化学习策略进行学习，实时捕捉候选路径的推荐分数，并将分数最高的路线推荐给空闲司机。QU等^[7]提出一个出租车收益的路线推荐方法即自适应最短预期闲逛路线（ASER）。该方法通过设计概率网格发现潜在的接载位置和动态的接载概率，然后利用卡尔曼滤波方法预测每个网格的接载概率和容量，并在大数据应用平台下开发新的数据结构框架以提高推荐效率。LUO与LV等^[20-21]基于接载乘客的概率和容量问题，也提出一种出租车路线推荐方法，为一组司机推荐最优的路线，目的是最小化平均闲逛驾驶距离。RONG等^[22-23]基于马尔科夫决策模型为出租车司机推荐长期利润最大化的路线。尽管当前存在的空闲出租车路线推荐工作已经解决了一些问题，但是这些方法均忽略了真实路网环境下某些路段的可等待因素，使推荐的路线因载客概率较低、行驶距离较长而花费成本变高。

2 模型构建 2.1 符号说明

定义1（节点）节点是指某2个路段相交的点，可以表示为$ s=(\mathrm{i}\mathrm{d}, \mathrm{l}\mathrm{a}\mathrm{t}, \mathrm{l}\mathrm{o}\mathrm{n}) $，其中：$ \mathrm{i}\mathrm{d} $是该节点的唯一标识符；$ \mathrm{l}\mathrm{a}\mathrm{t} $和$ \mathrm{l}\mathrm{o}\mathrm{n} $分别表示该节点所在的经纬度信息。

定义2（路段） 1条完整的路径可以被多个节点$ s $分割成多个路段序列$ r $，其中每个路段均由2个节点确定，即$ r=({s}_{\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}}, {s}_{\mathrm{e}\mathrm{n}\mathrm{d}}) $。如果该路段不是终点路段，则与其相连的其他路段称为它的邻居路段，可以表示为$ {r}_{i\_\mathrm{n}\mathrm{e}\mathrm{i}\mathrm{g}\mathrm{h}\mathrm{b}\mathrm{o}\mathrm{r}}=\left\{{r}_{j}|{r}_{j}^{s\_\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}}={r}_{i}^{s\_\mathrm{e}\mathrm{n}\mathrm{d}}\right\} $。

定义3（路线）路线R由一系列相连的路段组成，可以表示为$ R=\left\{{r}_{1}\to {r}_{2}\to \cdots \to {r}_{n}\right\} $，满足对于$ \forall i, \;1\le i\le n, {r}_{i+1}^{s\_\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}}={r}_{i}^{s\_\mathrm{e}\mathrm{n}\mathrm{d}} $，且该路线的起点为$ R\_\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}={r}_{1}^{s\_\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}} $，该路线的终点为$ R\_\mathrm{e}\mathrm{n}\mathrm{d}={r}_{n}^{s\_\mathrm{e}\mathrm{n}\mathrm{d}} $。

定义4（路网） 1个路网可以表示为有向图$ G\left(V, E, D\right) $，其中：$ V $表示所有节点的集合$ \left\{{s}_{1}, {s}_{2}, \cdots, \right. $ $ \left.{s}_{n}\right\} $；$ E $=$ \left\{{r}_{i\to j}|{s}_{i}\left(\mathrm{i}{\mathrm{d}}_{i}, \mathrm{l}\mathrm{a}{\mathrm{t}}_{i}, \mathrm{l}\mathrm{o}{\mathrm{n}}_{i}\right), {s}_{j}\left(\mathrm{i}{\mathrm{d}}_{j}, \mathrm{l}\mathrm{a}{\mathrm{t}}_{j}, \mathrm{l}\mathrm{o}{\mathrm{n}}_{j}\right)\in V\right\} $表示所有有向路段的集合；$ D $表示为每个路段物理距离长度的集合$ \left\{{d}^{{r}_{i\to j}}|{r}_{i\to j}\in E\right\} $，可由$ \mathrm{H}\mathrm{a}\mathrm{v}\mathrm{e}\mathrm{r}\mathrm{s}\mathrm{i}\mathrm{n}\mathrm{e} $式（1）计算得出：

$ \begin{array}{l}d=2{R}_{\mathrm{R}\mathrm{a}\mathrm{d}\mathrm{i}\mathrm{u}\mathrm{s}}\times \mathrm{a}\mathrm{r}\mathrm{c}\mathrm{s}\mathrm{i}\mathrm{n}\left((\mathrm{c}\mathrm{o}\mathrm{s}{\varphi }_{1}\mathrm{c}\mathrm{o}\mathrm{s}{\varphi }_{2}\mathrm{s}\mathrm{i}{\mathrm{n}}^{2}\left(\frac{{\lambda }_{2}-{\lambda }_{1}}{2}\right)+\right.\\ \;\;\;{\left.\mathrm{s}\mathrm{i}{\mathrm{n}}^{2}\left(\frac{{\varphi }_{2}-{\varphi }_{1}}{2}\right)\right)}^{\frac{1}{2}}\end{array} $

(1)

其中：$ \varphi $是节点$ s $的纬度；$ \lambda $是节点$ s $的经度；$ {R}_{\mathrm{R}\mathrm{a}\mathrm{d}\mathrm{i}\mathrm{u}\mathrm{s}} $为地球半径（$ {6}_{}{371}^{}\mathrm{k}\mathrm{m} $）。

定义5（轨迹）轨迹是按时间序列排序的一系列GPS采样点组成的集合，可以表示为$ {T}_{r}=\left\{{p}_{1}\to \right. $ $ {p}_{2}\to \left.\cdots \to {p}_{n}\right\} $，其中每个采样点表示为$ {p}_{i}=\left\{\mathrm{l}\mathrm{a}\mathrm{t}, \mathrm{l}\mathrm{o}\mathrm{n}, \right. $ $ \mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}, \left.\mathrm{e}\mathrm{m}\mathrm{p}\mathrm{t}\mathrm{y}, \mathrm{s}\mathrm{p}\mathrm{e}\mathrm{e}\mathrm{d}\right\} $记录着采样点的位置信息、时间戳、载客状态、速度等信息。对于出租车的载客状态可以由式（2）表示：

$ \mathrm{empty}=\left\{\begin{array}{l}0, \mathrm{占用}({{\rm{表示出租车接载乘客}}})\\ 1, {{\rm{闲逛}}}({{\rm{表示出租车寻找乘客}}})\end{array}\right. $

(2)

从出租车的轨迹数据中进一步挖掘出租车是否处于停车状态，并通过该状态标识可停车路段。对于2个连续的GPS采样点$ {p}_{i} $和$ {p}_{i+1} $，利用$ \mathrm{H}\mathrm{a}\mathrm{v}\mathrm{e}\mathrm{r}\mathrm{s}\mathrm{i}\mathrm{n}\mathrm{e} $公式计算这2个采样点之间的距离与时间间隔，当距离小于预先定义的距离阈值且时间间隔大于采集间隔时，认为此时出租车的状态为停车状态，意味着司机经过该路段时可以选择停留。值得注意的是，本文定义的停车路段是司机在该路段等待有需求的乘客。在现实生活中，这类场景多数会出现在周围有车站、商场附近、居民区等的路段。

定义6（时变的接载概率）由于不同道路环境、不同天气以及不同时段等因素的影响，对于某一路段的乘客需求在不断变化，因此对于任意的某一路段$ r $在时间间隔$ [{\tau }_{{}_{1}}, {\tau }_{2}] $内的搭载概率为：

$ p\left({r}_{i}, [{\tau }_{1}, {\tau }_{2}]\right)=\frac{{N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{o}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{p}\mathrm{i}\mathrm{e}\mathrm{d}}}{{N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{o}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{p}\mathrm{i}\mathrm{e}\mathrm{d}}+{N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{c}\mathrm{r}\mathrm{u}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{n}\mathrm{g}}} $

(3)

其中：$ {N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{o}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{p}\mathrm{i}\mathrm{e}\mathrm{d}} $表示在路段$ r $上司机搭载到乘客的数量；$ {N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{c}\mathrm{r}\mathrm{u}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{n}\mathrm{g}} $表示在路段$ r $没有搭载到乘客的数量。

2.2 问题描述

定义7 当出租车司机放下乘客处于空闲状态时，可以发出寻客请求。根据空闲司机发出请求的路段$ {r}_{1} $和请求时间$ t $，为空闲出租车司机推荐1个效益最大化的接载路线$ {R}^{\mathrm{*}}=\left\{{r}_{1}, {r}_{2}, \cdots, {r}_{L}\right\} $，使得空闲司机选择这条行驶路径接载到乘客时的闲逛距离、闲逛时间和花费成本最小，其中$ L $表示选择路径的长度。

由于司机的花费成本涉及到司机在接载到乘客之前所行驶的距离和所花费的时间，因此对于每个路段的潜在花费成本可以根据式（4）得出：

$ \mathrm{Cost}\left(r, t\right)=\left\{\begin{array}{l}\left(1-p\left({r}_{i}, \left[{\tau }_{1}, {\tau }_{2}\right]\right)\right)\times \left({t}_{w}\left(r\right){F}_{\mathrm{rent}}\right), {{\rm{等待}}}\\ \left(1-p\left({r}_{i}, \left[{\tau }_{1}, {\tau }_{2}\right]\right)\right)\times \left(d\left(r\right){F}_{\mathrm{gas}}+{t}_{p}\left(r\right){F}_{\mathrm{rent}}\right), {{\rm{行驶}}}\\ \left(1-p\left({r}_{i}, \left[{\tau }_{1}, {\tau }_{2}\right]\right)\right)\times \left(d\left(r\right){F}_{\mathrm{gas}}+\left({t}_{p}\left(r\right)+{t}_{w}\left(r\right)\right){F}_{\mathrm{rent}}\right), {{\rm{等待后行驶}}}\end{array}\right. $

(4)

其中：$ p\left({r}_{i}, [{\tau }_{1}, {\tau }_{2}]\right) $表示在该路段接载到乘客的概率；$ {F}_{\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}} $是出租车司机租赁出租车的费用；$ {F}_{\mathrm{g}\mathrm{a}\mathrm{s}} $是出租车行驶$ d\left(r\right) $距离所花费的汽油费；$ {t}_{p}\left(r\right) $是出租车穿过该路段所花费的时间；$ {t}_{w}\left(r\right) $是出租车停留在该路段所花费的时间；$ t={t}_{w}+{t}_{p} $是出租车司机在该路段上花费的总时间。

对于每个路段的潜在收益可根据式（5）得出：

$ \mathrm{E}\mathrm{a}\mathrm{r}\mathrm{n}\left(r, t\right)=\frac{\sum \limits_{i=1}^{{N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{o}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{p}\mathrm{i}\mathrm{e}\mathrm{d}}}d\left(r\right){F}_{\mathrm{g}\mathrm{a}\mathrm{s}}}{{N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{o}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{p}\mathrm{i}\mathrm{e}\mathrm{d}}}p\left(r, [{\tau }_{1}, {\tau }_{2}]\right) $

(5)

其中：$ {N}_{[{\tau }_{1}, {\tau }_{2}]}^{\mathrm{o}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{p}\mathrm{i}\mathrm{e}\mathrm{d}} $表示在路段$ r $上司机搭载到乘客的数量；$ d\left(r\right){F}_{\mathrm{g}\mathrm{a}\mathrm{s}} $是接载到1个乘客获得的收益。

因此对于每个路段的净收益可由式（6）计算得出：

$ \mathrm{P}\mathrm{r}\mathrm{o}\left(r, t\right)=\mathrm{E}\mathrm{a}\mathrm{r}\mathrm{n}\left(r, t\right)-\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left(r, t\right) $

(6)

由该公式可以看出影响司机收益的重要因素是在闲逛时所花费的时间和距离。因此，如果想要提高司机的收益，需要不断减小司机在闲逛时所花费的成本，即$ \mathrm{M}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{i}\mathrm{s}\mathrm{e}\left(\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left(r, t\right)\right) $。

基于以上定义，对于任意推荐路线$ R=\left\{{r}_{1}, {r}_{2}, {r}_{3}, \cdots, {r}_{L}\right\} $可以计算其花费成本函数为：

$ \begin{array}{l} P\left(R\right)=\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{1}, t\right)+\\ \;\;\;\;\;\;\;\;\sum \limits_{i=2}^{L}\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{i}\right)\prod \limits_{j=1}^{i-1}\left(1-p\left({r}_{j}, \left[{\tau }_{1}, {\tau }_{2}\right]\right)\right) \end{array} $

(7)

式（7）表示对于推荐的路径$ R $来说，当出租车司机在路段$ {r}_{L} $接到乘客时，所花费的成本最小。

以往的推荐算法是推荐一个连续驾驶路线，只考虑经过路段的当前接载概率，若经过该路段没有接载到乘客，则会继续推荐路线，导致闲逛距离变长，花费成本变高。而本文算法考虑了路段的可等待因素。当出租车司机经过靠近地铁站、车站、学校等地方的路段时，能够预判未来某个时刻可能发生的事件，比如由于地铁到站而出现大量换乘乘客。本文算法推荐的具体步骤如下：1）根据历史GPS轨迹信息为每个路段统计出最长等待时间；2）设定司机在某路段等待的时间阈值是3 min，单位时间间隔是1 min；3）当空闲司机在$ {r}_{1} $路段发出请求时，对于$ {r}_{1} $的邻居路段$ \{{r}_{2}, {r}_{3}, {r}_{4}\} $以及等候间隔有2种方案选择。这2种方案具体如下：

方案1 在请求路段等待1~3 min，接到乘客；

方案2 在请求路段等待1~3 min后，选择前往下一个路段。

对于某个司机在某一路段的选择可由式（8）表示：

$ \begin{array}{l} {f}_{\mathrm{c}\mathrm{a}\mathrm{s}\mathrm{e}}= \\ \underset{i, j}{\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{i}\mathrm{n}}\left[\begin{array}{l}\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{1}, {t}_{1}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{1}, {t}_{2}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{1}, {t}_{3}\right)\\ \mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{2}, {t}_{1}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{2}, {t}_{2}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{2}, {t}_{3}\right)\\ \mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{3}, {t}_{1}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{3}, {t}_{2}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{3}, {t}_{3}\right)\\ \mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{4}, {t}_{1}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{4}, {t}_{2}\right), \;\mathrm{C}\mathrm{o}\mathrm{s}\mathrm{t}\left({r}_{4}, {t}_{3}\right)\end{array}\right] \end{array}$

(8)

其表达含义是选择最小花费的行驶路段以及等待时间时的方案。因此，对于所有行驶时间小于$ T $ min，驾驶路段不超过$ L $的候选路径$ R=\left\{{R}_{1}, {R}_{2}, \cdots, {R}_{n}\right\} $，本文的目标是选择1个花费成本最小的候选路径$ {R}^{\mathrm{*}} $，满足：

$ \mathrm{M}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{i}\mathrm{s}\mathrm{e}\left(\sum \limits_{j=1}^{L}{\rm{cost}}\left({r}_{i}, {t}_{j}\right)\right) $

(9)

3 算法设计

如图 1所示，基于候客点规划的空闲出租车路线推荐方法主要分为3个步骤：1）进行数据预处理，利用最短路径匹配算法将GPS轨迹匹配到真实的路网环境中，并依据连续2个GPS点之间的距离和时间属性识别可停车路段，将停车时间超过5 min的路段标记出来；2）根据每个路段发生接载事件的特征，利用改进的多层感知机有效地预测不同天、不同时段以及不同天气情况下的接载概率；3）根据每个路段的接载概率，设计考虑等待时间的最小成本优化路径算法，从而进行有效的路线推荐，实现效益最大化。

	Download: JPG larger image
图 1 出租车路线推荐框架 Fig. 1 Framework of taxi route recommendation

3.1 路段匹配算法

在实际的数据采集过程中，由于设备的精度误差、路网障碍物、信号强弱因素的影响，GPS设备采集到的轨迹点往往散落在道路周围而无法匹配到具体的路段上。本文首要解决的问题是为轨迹数据匹配具体的路段，并根据状态位的表示为每个路段统计载客次数和空载次数，具体的算法描述及操作过程如下：

算法1 路段匹配

输入连续的GPS轨迹点$ {p}_{1}\to {p}_{2}\to \cdots \to {p}_{n} $，路网$ G $

输出一系列匹配{matchsegment}

初始化 $ \mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{G}\mathrm{P}\mathrm{S}=\mathrm{n}\mathrm{u}\mathrm{l}\mathrm{l} $，$ \mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{S}\mathrm{e}\mathrm{g}=\mathrm{n}\mathrm{u}\mathrm{l}\mathrm{l} $

1.设置$ \mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\;\;\;\;\;\mathrm{G}\mathrm{P}\mathrm{S}={\mathrm{p}}_{1} $

2.foreach $ {\mathrm{p}}_{\mathrm{i}} $：/* i≥2*/

3.$ \mathrm{p}\mathrm{r}\mathrm{e}\mathrm{v}\mathrm{i}\mathrm{o}\mathrm{u}\mathrm{s}\mathrm{G}\mathrm{P}\mathrm{S}=\mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{G}\mathrm{P}\mathrm{S} $

4.$ \mathrm{p}\mathrm{r}\mathrm{e}\mathrm{v}\mathrm{i}\mathrm{o}\mathrm{u}\mathrm{s}\mathrm{S}\mathrm{e}\mathrm{g}=\mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{S}\mathrm{e}\mathrm{g} $

5.$ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g}=\mathrm{b}\mathrm{u}\mathrm{f}\mathrm{f}\mathrm{e}\mathrm{r}\left(\mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{G}\mathrm{P}\mathrm{S}, \mathrm{r}\mathrm{a}\mathrm{d}\mathrm{i}\mathrm{u}\mathrm{s}\right) $

6.if $ \mathrm{l}\mathrm{e}\mathrm{n}\left(\mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g}\right)==0 $

7.return $ \mathrm{n}\mathrm{u}\mathrm{l}\mathrm{l} $

8.if $ \mathrm{l}\mathrm{e}\mathrm{n}\left(\mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g}\right)==1 $

9.return $ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g}.\mathrm{g}\mathrm{e}\mathrm{t}\left(0\right) $

10.设置$ \mathrm{m}\mathrm{i}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{s}=+\mathrm{\yen } $

11.for each $ \mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t} $in $ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g} $

12.$ \mathrm{n}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{S}\mathrm{e}\mathrm{g}=\mathrm{f}\mathrm{i}\mathrm{n}\mathrm{d}\mathrm{d}\mathrm{i}\mathrm{s}\left(\mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{G}\mathrm{P}\mathrm{S}, \mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t}, \mathrm{m}\mathrm{i}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{s}\right) $

13.if $ \mathrm{l}\mathrm{e}\mathrm{n}\left(\mathrm{n}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{S}\mathrm{e}\mathrm{g}\right)==1 $

14.return $ \mathrm{n}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{S}\mathrm{e}\mathrm{g}.\mathrm{g}\mathrm{e}\mathrm{t}\left(0\right) $

15.else for each $ \mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t} $in $ \mathrm{n}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{S}\mathrm{e}\mathrm{g} $

16.$ \mathrm{m}\mathrm{a}\mathrm{t}\mathrm{c}\mathrm{h}\mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t} $ = $ \mathrm{f}\mathrm{i}\mathrm{n}\mathrm{d}\mathrm{a}\mathrm{n}\mathrm{g}\mathrm{l}\mathrm{e}(\mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t}, \mathrm{p}\mathrm{r}\mathrm{e}\mathrm{v}\mathrm{i}\mathrm{o}\mathrm{u}\mathrm{s}\mathrm{G}\mathrm{P}\mathrm{S}, $ $ \mathrm{c}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{G}\mathrm{P}\mathrm{S}) $

17.return $ \mathrm{m}\mathrm{a}\mathrm{t}\mathrm{c}\mathrm{h}\mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t} $

18.end for

结合实例（如图 2）对路段匹配算法1进行分析。对于待处理轨迹点$ {p}_{2} $，首先根据路网数据为其划分1个缓冲区buffe，如图 2（a）所示。该缓冲区由当前点$ {p}_{2} $为中心点，radius作为半径进行划分。接着找到与该缓冲区相交的所有路段作为$ {p}_{2} $的候选路段$ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g} $（算法1第5行）。然后对候选路段$ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g} $进行判断：情况1，如果候选路段为空则返回$ \mathrm{n}\mathrm{u}\mathrm{l}\mathrm{l} $，匹配失败；情况2，如果只包含一个候选路段则直接返回该路段作为匹配路段（如算法1第6~9行所示）；情况3，如果存在多个候选路段，如图 2（a）所示，$ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g}=\{{r}_{1}, {r}_{2}, {r}_{3}\} $，则继续进行判断。遍历所有候选路段$ \mathrm{c}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{S}\mathrm{e}\mathrm{g} $，通过最短距离函数finddis计算出与$ {p}_{2} $点距离最近的所有路段$ \mathrm{N}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{S}\mathrm{e}\mathrm{g} $并判断：情况1，若只包含1个最近路段则直接返回该路段作为匹配路段（如算法1第10~14行所示）；情况2，若存在多个最近路段，如图 2（b）所示，nearSeg={r₁，r₂}，则继续利用最小角度函数findangle计算与行驶方向夹角最小的路段作为匹配路段（如算法1第15~17行所示），如图 2（c）所示。最后，返回最佳匹配路段$ \mathrm{M}\mathrm{a}\mathrm{c}\mathrm{h}\mathrm{s}\mathrm{e}\mathrm{g}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{t} $=$ {r}_{1} $，如图 2（d）所示。

	Download: JPG larger image
图 2 道路匹配实例 Fig. 2 Example of road matching

3.2 基于历史数据的接载概率预测

在路线推荐问题中，有效预测不同路段的接载概率非常重要。由于不同位置、不同时段、不同天气等因素的影响，潜在的接载概率不断发生变化。以不同地区为例，图 3分别表示了上海市虹桥机场和上海市南京步行街在不同时刻发生接载事件的情况，从图 3（a）可以看出，在上午04：00—08：00时，有较多乘客选择前往机场，并在机场下车。同一时段也可以发现，从机场选择搭载出租车出行的需求比较少，导致在该时段有大量出租车空闲。因此，即使是机场这样的接载热点，此时的接载概率也非常低。而在晚上20：00以后情况则刚好相反，该时段内大量乘客到达机场并选择前往市区，接载概率变高，可以将该地区推荐给空闲司机。

	Download: JPG larger image
图 3 不同地区不同时段接载需求变化 Fig. 3 Changes in pick-up demand over time in different regions

如图 3（b）所示，对于商业区来说，供需不平衡主要发生在08：00—12：00，此时前往该地区的乘客较多，离开的人数比较少，可以看出在白天前往商业区的需求比较大，而晚上情况相反，更多的乘客选择从步行街离开，并前往不同的地方。因此，为有效学习这些因素对接载概率的影响，本文构建了一种基于多指标输入的多层感知机模型（Multi-Layer Perceptron Model，MLP）进行接载概率的预测。

针对接载概率预测问题，大多数研究认为接载概率的变化往往与时间序列呈现紧密的关系，并利用传统的多层感知机方法，将单一的时间序列作为输入数据，设计滞后步长并进行接载概率的预测。然而通过前文的实例分析可以发现，影响路段接载概率的因素十分复杂（比如天气、高平峰时段等）。这些复杂的因素与路段的接载概率呈现非线性相关，仅通过自身的时间序列进行建模无法捕捉到真正的变化规律。因此，为提高预测精度，本文将提取多个相关因素如日期属性、天气、高平峰时段、时间间隔、接载数量和空载数量作为模型的输入，然后通过建立一个实时的接载概率预测模型，进行预测与分析。

为合理预测不同路段的接载概率，本文首先对空闲出租车在路段的行驶条件做出相应的假设。假设1：空闲出租车在巡航路段均为单向行驶，不会存在倒车；假设2：空闲出租车不会在同一路段震荡行驶（走走停停）。基于上述假设条件，本文对所选取的因素进行量化处理。比如：日期属性（1代表工作日、2代表周末、0代表国家法定节假日）；天气（-1代表雨天、1代表晴天、0代表多云）；高平峰（07：00—10：00用1表示、17：00—20：00用2表示、其余时段用0表示）；时间间隔（以1min为间隔划分工作时间段05：00—23：00共1 140个间隔）、接载数量（每个间隔内载客次数）、空载数量（每个间隔内空载次数）。将量化后的特征作为模型的输入，模型的输出即为预测不同路段的接载概率。对于感知机模型来说，不同的神经元个数以及不同的激活函数影响着训练模型的复杂程度。本文选取上海市淮海中路这一路段，提取出该路段在30天内发生接载事件的特征数据，并以其80%的数据作为训练数据，剩余20%作为测试数据。经过多次训练发现，当训练的网络层数选择3层，神经元的个数选择256，激活函数选择ReLU时效果最佳。该模型的预测结果如图 4所示，从图中可以看出预测值和真实值的分布较为一致，验证了预测模型的稳定性。

	Download: JPG larger image
图 4 预测结果对比 Fig. 4 Comparison of forecast results

为进一步验证基于多指标输入的多层感知机模型的性能，本文利用只考虑时间序列的感知机模型作为对比，并使用MAE、MAPE、R²作为指标对预测效果进行评估，结果如表 1所示。由表 1可知，改进后的多层感知机在MAE、MAPE、R²指标上分别提高了0.5%、13%和16%。

下载CSV 表 1 预测精度 Table 1 Prediction accuracy

3.3 考虑等待时间的路线推荐算法

在3.1节、3.2节中对路段数据的处理使每个路段不仅包含了节点的经纬度信息，还包含了历史的接载情况、穿行该路段的距离以及时间。依据这些信息可以利用式（4）和式（8）计算出每个路段的花费成本，然后根据推荐算法推荐效益最好的路线。

算法2 考虑等待时间的路线推荐

输入司机请求reg=（r₁，t），道路网络$ G $，搜索时间限制$ T $，搜索路段限制$ L $

输出推荐路线$ {R}^{*} $

初始化 最小花费mincost = +∞，等待间隔waitsolt = n，路段集合$ R=\mathrm{\varnothing } $

1.设置$ \mathrm{c}\_\mathrm{r}={\mathrm{r}}_{1} $，$ \mathrm{c}\_\mathrm{t}=\mathrm{t} $，$ \mathrm{j}=1 $，$ \mathrm{R}=\left\{\mathrm{c}\_\mathrm{t}\right\} $

2.while $ \mathrm{l}\mathrm{e}\mathrm{n}\mathrm{g}\mathrm{t}\mathrm{h}\left(\mathrm{R}\right){{\rm{£}} }\mathrm{L} $ or $ (\mathrm{c}\_\mathrm{t} $-$ \mathrm{t})\left(\right){{\rm{£}} }\mathrm{T} $

3.foreach待选择路段$ {r}_{i} $ in $ \left\{\mathrm{c}\_\mathrm{r}\right\}\bigcup \mathrm{c}\_{\mathrm{r}}_{\mathrm{n}\mathrm{e}\mathrm{i}\mathrm{g}\mathrm{h}\mathrm{b}\mathrm{o}\mathrm{r}} $

4.foreach等待间隔$ {\mathrm{t}}_{\mathrm{j}}{{\rm{£}} }\mathrm{n} $

5.计算当前接载概率$ \mathrm{p}=\mathrm{M}\mathrm{L}\mathrm{P}({\mathrm{r}}_{\mathrm{i}}, \mathrm{c}\_\mathrm{t}) $

6.计算当前成本$ \mathrm{v}\mathrm{a}\mathrm{l}\mathrm{u}\mathrm{e}=\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}({\mathrm{r}}_{\mathrm{i}}, {\mathrm{t}}_{\mathrm{j}}) $

7.if $ \mathrm{v}\mathrm{a}\mathrm{l}\mathrm{u}\mathrm{e}{{\rm{£}} }\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t} $

8.$ \mathrm{m}\mathrm{i}\mathrm{n}\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}=\mathrm{v}\mathrm{a}\mathrm{l}\mathrm{u}\mathrm{e} $

9$ .\mathrm{t}\mathrm{m}\mathrm{p}\_\mathrm{r}={\mathrm{r}}_{\mathrm{i}} $

10.$ \mathrm{t}\mathrm{m}\mathrm{p}\_\mathrm{t}={\mathrm{t}}_{\mathrm{j}} $

11.end if

12.$ \mathrm{R}=\mathrm{R}\bigcup \left\{\mathrm{t}\mathrm{m}\mathrm{p}\_\mathrm{r}\right\} $

13.if $ \mathrm{c}\_\mathrm{r}={\mathrm{r}}_{\mathrm{i}} $

14.$ \mathrm{c}\_\mathrm{t}=\mathrm{c}\_\mathrm{t}+\mathrm{t}\mathrm{m}\mathrm{p}\_\mathrm{t} $

15.else

16.$ \mathrm{c}\_\mathrm{t}=\mathrm{c}\_\mathrm{t}+\mathrm{t}\mathrm{m}\mathrm{p}\_\mathrm{t}+{\mathrm{t}}_{\mathrm{p}}\left({\mathrm{r}}_{\mathrm{i}}\right) $

17.$ \mathrm{c}\_\mathrm{r}=\mathrm{t}\mathrm{m}\mathrm{p}\_\mathrm{r} $

18.end while

19.return $ {\mathrm{R}}^{\mathrm{*}}=\mathrm{R} $

算法2描述了考虑等待时间的最小成本推荐方法的伪代码。该算法首先从空闲司机的请求中获取司机所在的当前路段$ r $和当前时间$ t $，并根据所在路段位置和时间信息得到一系列的邻居路段$ c\_{r}_{\mathrm{n}\mathrm{e}\mathrm{i}\mathrm{g}\mathrm{h}\mathrm{b}\mathrm{o}\mathrm{r}} $和当前路段的接载概率p（如算法2第1~5行所示）。然后，基于图搜索算法中的广度优先搜索算法策略对当前路段的每个选择（等待时间间隔，路段选择），并计算其潜在的花费成本（如算法2第6行所示），选择花费成本最小的方案（等待间隔，路段）作为当前的选择（如算法2第7~12行所示）。该过程被不断重复直到推荐的路段长度超过$ L $或者推荐的搜索总时间超过$ T $时，则停止推荐并得出最小成本的推荐路线。

为解释、验证本文提出的算法，下面将通过一个具体实例进行说明。如图 5所示，假设每个路段的行驶时间为30 s，路段的等待间隔为2 min，行驶距离为120 m，其单位时间的租赁费为0.1，汽油费为0.01。当司机在$ {r}_{4} $路段在上午08：15发出寻客请求时，由于直接穿行$ {r}_{4} $和$ {r}_{5} $路段的花费最小，因此直接到达$ {s}_{6} $节点时间为上午08：16。此时对于传统推荐算法而言，如果在$ {r}_{5} $路段未接到乘客，则会继续选择下一个行驶路段。假设选择的行驶路段为$ {r}_{3} $，此时的花费成本为4.2。而根据式（8）可知，本文推荐的算法在该路段节点选择等待2 min接载到乘客所花费的成本最小，为1.2。原因是该路段在上午08：18时刻会有公交车到站，出现大量换乘乘客，乘客需求增加，导致该路段的接载概率提高，使得空闲司机更容易接载到乘客。

	Download: JPG larger image
图 5 考虑等待时间的路线推荐案例 Fig. 5 Route recommendation case considering waiting time

4 实验结果与分析 4.1 实验数据及实验环境

出租车数据：本文所采用的实验数据来自上海10 694名出租车司机在2015年4月1日—2015年4月30日生成的轨迹数据，其中每条数据的采集频率是10 s，采集内容包含司机编号、位置、时间、速度、状态等，详细信息如表 2所示。

下载CSV 表 2 上海市出租车GPS轨迹数据示例 Table 2 Example of GPS trajectories data taxis in Shanghai

从OpenStreetMap中获取上海市真实道路网络的所有信息，其中包含465 735个节点和197 758个路段，并且在处理数据的过程中为每个路段标记其穿行时间、穿行距离和历史接载数据。

本文实验采用的性能评估和分析代码是用python语言编写，并在Intel^® Core^TM i5-8250 CPU@1.60 GHz，8 GB内存64位Windows10系统上运行。

4.2 对比算法

为评估本文所提算法，采用以下4种算法作为对比，进一步验证该算法的有效性。

1）MNP算法。为空闲司机推荐完整路线，通过净利润函数计算每个路段的潜在效益，并分别利用暴力法和递归策略筛选出最佳的推荐路线^[5]。

2）InExperience算法。利用历史轨迹数据计算每个司机的历史收入，把收入最低的前10%的司机作为无经验司机，并选择其空闲时所行驶的路线作为对比。

3）Random算法。采用随机方式为每种决策编号，并通过随机函数选择路线。

4）WTR算法。本文提出的基于候客点规划的空闲出租车路线推荐算法。

4.3 对比指标

本文选择提升性能比$ {r}^{\mathrm{i}\mathrm{m}\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{v}\mathrm{e}} $作为实验的性能指标，其含义为相比较其他算法所提升的百分比。当获得一个司机请求时，从不同算法推荐路线所需要的花费成本、闲逛时间和闲逛距离3个方面进行性能验证，计算式如式（10）~式（12）所示：

$ {r}_{\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}}^{\mathrm{i}\mathrm{m}\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{v}\mathrm{e}}=\frac{{B}_{\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}}-{W}_{\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}}}{{B}_{\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{t}}}\times 100\mathrm{\%} $

(10)

$ {r}_{\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}}^{\mathrm{i}\mathrm{m}\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{v}\mathrm{e}}=\frac{{B}_{\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}}-{W}_{\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}}}{{B}_{\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}}}\times 100\mathrm{\%} $

(11)

$ {r}_{\mathrm{d}\mathrm{i}\mathrm{s}}^{\mathrm{i}\mathrm{m}\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{v}\mathrm{e}}=\frac{{B}_{\mathrm{d}\mathrm{i}\mathrm{s}}-{W}_{\mathrm{d}\mathrm{i}\mathrm{s}}}{{B}_{\mathrm{d}\mathrm{i}\mathrm{s}}}\times 100\mathrm{\%} $

(12)

式（10）表示本文推荐的算法WTR与对比算法相比，在花费成本这一指标上所改善的性能百分比。式（11）表示本文推荐的WTR算法相比较对比算法，在闲逛时间这一指标上所改善的性能百分比。式（12）表示本文推荐的WTR算法相比较对比算法，在闲逛距离这一指标上所改善的性能百分比。

4.4 实验结果

本文首先在上海市淮海中路附近（经度121.457 0°至121.486 8°，纬度31.219 8°至31.230 9°）随机生成规模大小分别为10个、100个、1 000个空闲司机请求数据。然后，对每个司机的请求位置和时间分别利用本文提出的WTR算法和对比算法进行相应的路线推荐。其中，路段限制参数$ L $和时间限制参数$ T $分别是5 km和10 min。最后，根据式（10）~式（12）做出相应的计算，对比实验结果如图 6所示。

	Download: JPG larger image
图 6 实验结果对比 Fig. 6 Comparison of experimental results

图 6（a）表示空闲司机的推荐数量与推荐路线花费成本之间的实验结果，从图中可以看出本文考虑等待时间因素的推荐算法与算法MNP、InExperence、Random相比均有所提升。其中，当推荐次数为10时，WTR算法的花费成本相比较MNP、InExperience、Random分别减少了44.6%、53.2%、49.45%。当推荐次数为100时，分别减少了42.3%、51.5%、45.2%；当推荐次数等于1 000时，分别减少了35.7%、40.1%、41%。对于整体的趋势而言，随着推荐次数的增加，推荐性能的百分比下降，这是因为随着空闲司机的增多，可接载的乘客数量变得越来越少，这对于所有的推荐方法来说都将很难找到乘客，因此推荐成本的差距会变得越来越小。

图 6（b）表示空闲司机的推荐数量与推荐路线闲逛时间之间的实验结果，从图中可以看出本文考虑等待时间因素的推荐算法与算法MNP、InExperence和Random相比均有所提升。其中当推荐次数为10时，WTR算法的闲逛时间与算法MNP、InExperience、Random相比分别减少了21.2%、33.3%、29.7%；当推荐次数为100时，分别减少了18.3%、28.5%、12.2%；当推荐次数等于1 000时，分别减少了13.3%、12.2%、12.7%。随着推荐次数的增加，闲逛时间的提升百分比不断下降。这是因为随着乘客需求的减少，任何一种推荐方法都难以接载到乘客，其闲逛时间的差距也会逐渐减小。

图 6（c）表示空闲司机的推荐数量与推荐路线闲逛距离之间的实验结果，从图中可以看出本文提出的推荐算法WRT与算法MNP、InExperence、Random相比均有所提升。其中，当推荐次数为10时，WTR算法的闲逛距离与算法MNP，InExperience、Random相比分别减少了41.7%、53%、48%；当推荐次数为100时，分别减少了43.6%、57.3%、46.4%；当推荐次数为1 000时，分别减少了46.4%、58.2%、76.1%。对于花费成本和闲逛时间的变化趋势，闲逛距离随着推荐次数的增加呈上升趋势，原因是本文算法考虑了等待时间因素，当推荐次数逐渐增加而乘客需求不断减少时，任何一种推荐方法都难以接到乘客。不同于MNP、InExperience、Random算法，本文所提算法在这种情况下更多选择了停留等待的决策来减少行驶距离所带来的成本。

由以上分析可以得出，本文所提基于候客点规划的空闲出租车路线推荐算法通过考虑某些路段的等待因素，可以让空闲司机更快地接载到乘客，从而减少司机的闲逛时间和闲逛距离，提高出租车的运输效率。

5 结束语

现有的空闲司机路线推荐算法在考虑真实路网环境时忽略了路段可等待因素，导致所推荐的路线花费成本较高。为此，本文提出一种基于候客点规划的空闲出租车路线推荐算法。通过处理出租车轨迹数据及统计每个路段的历史接载信息，将轨迹点与真实路段进行匹配，并利用改进的多层感知机对出租车的接载概率进行建模。此外，在考虑路段可等待因素的基础上设计一种路线推荐算法。在真实的上海出租车轨迹数据集上的实验结果表明，相较于MNP、InExperence、Random等算法，本文所提算法在花费成本、巡航时间以及巡航距离方面均有明显减少。下一步将通过研究司机的目的地偏好，在本文工作基础上为空闲司机设计个性化的推荐机制。

参考文献

[1]	LI X C, CONG G, CHENG Y. Spatial transition learning on road networks with deep probabilistic models[C]//Proceedings of the 36th International Conference on Data Engineering. Washington D. C., USA: IEEE Press, 2020: 349-360.
[2]	WANG S, BAO Z F, CULPEPPER J. S, et al. A survey on trajectory data management, analytics, and learning[J]. ACM Computing Surveys, 2020, 1(1): 1-36.
[3]	李浩, 霍雯, 裴春营, 等. 融合VGG与FCN的智能出租车订单预测模型[J]. 计算机工程, 2020, 46(12): 276-282. LI H, HUO W, PEI C Y, et al. Intelligent taxi order forecasting model fusing VGG with FCN[J]. Computer Engineering, 2020, 46(12): 276-282. (in Chinese)
[4]	孙冠东, 张兵, 刘禹岍, 等. 基于载客数据的出租车热门区域功能发现[J]. 计算机工程, 2017, 43(5): 16-22. SUN G D, ZHANG B, LIU Y Q, et al. Taxi hot area function discovery based on passenger data[J]. Computer Engineering, 2017, 43(5): 16-22. (in Chinese)
[5]	QU M, ZHU H S, LIU J M, et al. A cost-effective recommender system for taxi drivers[C]//Proceedings of 2014 ACM Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2014: 45-54.
[6]	GARG N, RANU S. Route recommendations for idle taxi drivers: find me the shortest route to a customer! [C]//Proceedings of 2018 ACM Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2018: 1425-1434.
[7]	QU B T, YANG W X, CUI G, et al. Profitable taxi travel route recommendation based on big taxi trajectory data[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(2): 653-668.
[8]	YUAN J, ZHENG Y, XIE X, et al. T-drive: enhancing driving directions with taxi drivers' intelligence[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(1): 220-232.
[9]	WANG R, CHOW C Y, LYU Y, et al. TaxiRec: recommending road clusters to taxi drivers using ranking-based extreme learning machines[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(3): 585-598.
[10]	LIU Y C, LIU C R, LU X J, et al. Point-of-interest demand modeling with human mobility patterns[C]//Proceedings of 2017 ACM International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2017: 947-955.
[11]	JI S G, WANG Z Y, LI T R, et al. Spatio-temporal feature fusion for dynamic taxi route recommendation via deep reinforcement learning[J]. Knowledge-Based Systems, 2020, 205: 106302.
[12]	QIN G Y, LI T N, YU B, et al. Mining factors affecting taxi drivers' incomes using GPS trajectories[J]. Transportation Research Part C: Emerging Technologies, 2017, 79: 103-118.
[13]	YUAN J, ZHENG Y, ZHANG L H, et al. Where to find my next passenger[C]//Proceedings of 2011 ACM International Conference on Ubiquitous Computing. New York, USA: ACM Press, 2011: 109-118.
[14]	YUAN N J, ZHENG Y, ZHANG L H, et al. T-Finder: a recommender system for finding passengers and vacant taxis[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(10): 2390-2403.
[15]	VERMA T, VARAKANTHAM P, KRAUS S, et al. Augmenting decisions of taxi drivers through reinforcement[C]//Proceedings of 2017 International Conference on Automated Planning and Scheduling. Palo Alto, USA: AAAI Press, 2017: 409-417.
[16]	CHEN Y P, FU Q W, ZHU J H, et al. Finding next high-quality passenger based on spatio-temporal big data[C]//Proceedings of 2020 IEEE International Conference on Cloud Computing and Big Data Analytics. Washington D. C., USA: IEEE Press, 2020: 447-452.
[17]	LI X J, SUN Y E, LIU Q, et al. Promise: a taxi recommender system based on inter-regional passenger mobility[C]//Proceedings of 2019 International Joint Conference on Neural Networks. Washington D. C., USA: IEEE Press, 2019: 1-8.
[18]	LI X L, PAN G, WU Z H, et al. Prediction of urban human mobility using large-scale taxi traces and its applications[J]. Frontiers of Computer Science, 2012, 6(1): 111-121.
[19]	HWANG R H, HSUEH Y L, CHEN Y T, et al. An effective taxi recommender system based on a spatio-temporal factor analysis model[J]. Information Sciences, 2015, 314(12): 28-40.
[20]	LUO Z W, LV H M, FANG F, et al. Dynamic taxi service planning by minimizing cruising distance without passengers[J]. IEEE Access, 2018, 6: 70005-70016.
[21]	LÜ H M, FANG F, ZHAO Y S, et al. A performance evaluation model for taxi cruising path recommendation system[C]//Proceedings of 2017 Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 2017: 156-167.
[22]	YU X L, GAO S, HU X B, et al. A Markov decision process approach to vacant taxi routing with e-hailing[J]. Transportation Research Part B: Methodological, 2019, 121(6): 114-134.
[23]	RONG H G, ZHOU X, YANG C, et al. The rich and the poor: a Markov decision process approach to optimizing taxi driver revenue efficiency[C]//Proceedings of 2016 ACM International on Conference on Information and Knowledge Management. New York, USA: ACM Press, 2016: 2329-2334.