基于联合<i>Q</i>值分解的强化学习网约车订单派送

doi:10.19678/j.issn.1000-3428.0063438

摘要/Abstract

摘要： 因网约车订单派送不合理，导致资源利用率和出行效率降低。基于联合Q值函数分解的框架，提出两种订单派送方法ODDRL和LF-ODDRL，高效地将用户订单请求派送给合适的网约车司机，尽可能缩短乘客等待时间。为捕获网约车订单派送场景中随机需求与供应动态变化关系，把城市定义为一张四边形网格的地图，将每辆车视为一个独立的智能体，构建多智能体马尔可夫决策过程模型，通过最大化熵与累计奖励训练智能体。将多智能体的联合Q值函数转化为易分解函数，使联合Q值函数与单个智能体值函数中的动作具有一致性，同时设计动作搜索函数，结合集中训练、分散执行策略的优点，让每辆车以分布式的方式解决订单匹配问题，而不需要与其他车辆进行协调，从而降低复杂性。实验结果表明，相比Random、Greedy、QMIX等方法，所提ODDRL和LF-ODDRL具有较优的扩展性，其中，在500×500网格上，当乘客数为10、车辆数为2时，相对于QMIX方法接送乘客所产生的总时间分别缩短5%和12%。

关键词: 多智能体, 强化学习, 值函数, 订单派送, 神经网络

Abstract: Resource utilization and travel efficiency are often reduced owing to an unreasonable dispatch of online car-hailing orders.Based on the joint Q-value function decomposition framework, two order dispatch methods, ODDRL and LF-ODDRL, are proposed to efficiently dispatch user requests to appropriate online car-hailing drivers to minimize passenger waiting times.To capture the dynamic change relationship between random demand and supply in the online car-hailing order dispatch scenario, the city is defined as a quadrilateral grid map, and each vehicle is considered as an independent agent.A multi-agent Markov Decision Process(MDP) model is developed to train agents by optimizing entropy and cumulative rewards.The joint Q-value function of multi-agents is transformed into a decomposable function so that the actions in the joint Q-value function and the value function of a single agent are consistent.At the same time, the action search function is designed by combining the benefits of centralized training and decentralized execution strategy so that each vehicle can solve the order matching problem in a distributed manner without coordinating with other vehicles, thereby reducing complexity.The experimental results demonstrate that the proposed ODDRL and LF-ODDRL have better scalability than Random, Greedy, QMIX, and other methods.On the 500×500 grid, when the number of passengers is 10 and the number of vehicles is 2, the total time for picking up is shorten by 5% and 12% respectively, when compared to the QMIX method.

Key words: multi-agent, reinforcement learning, value function, order dispatch, neural network

中图分类号:

TP391

黄晓辉, 张雄, 杨凯铭, 熊李艳. 基于联合Q值分解的强化学习网约车订单派送[J]. 计算机工程, 2022, 48(12): 296-303,311.

HUANG Xiaohui, ZHANG Xiong, YANG Kaiming, XIONG Liyan. Reinforcement Learning Online Car-Hailing Order Dispatch Based on Joint Q-value Decomposition[J]. Computer Engineering, 2022, 48(12): 296-303,311.

https://www.ecice06.com/CN/Y2022/V48/I12/296

图/表 8

20230112184331

20230112184334

20230112184338

20230112184343

20230112184348

20230112184351

20230112184354

20230112184358

参考文献

[1] ZHOU M, JIN J R, ZHANG W N, et al.Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching[C]//Proceedings of the 28th International Conference on Information and Knowledge Management.New York, USA:ACM Press, 2019:2645-2653.
[2] VERMA T, VARAKANTHAM P, KRAUS S, et al.Augmentin-g decisions of taxi drivers through reinforcem-ent learning for improving revenues[EB/OL].[2021-10-28].https://www.researchgate.net/profile/Tanvi-Verma-3/publication/324963776_Augmenting_Decisions_of_Taxi_Drivers_through_Reinforcement_Learning_for_Improving_Revenues/links/5aed037b458515f59982eccf/Augmenting-Decisions-of-Taxi-Drivers-through-Reinforcement-Learning-for-Improving-Revenues.pdf.
[3] JIAO Y, TANG X C, QIN Z W, et al.Real-world ride-hailing vehicle repositioning using deep reinforcement learning[J].Transportation Research Part C:Emerging Technologies, 2021, 130:103289.
[4] WALDY J, HOONG C L.Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers[C]//Proceedings of International Conference on Automated Planning and Scheduling.[S.l.]:AAAI Press, 2020:394-402.
[5] ZHENG H Y, WU J.Online to offline business:urban taxi dispatching with passenger-driver matching stability[C]//Proceedings of the 37th International Conference on Distributed Computing Systems.Washington D.C., USA:IEEE Press, 2017:816-825.
[6] ZHANG R, PAVONE M.Control of robotic mobility-on-demand systems:a queueing-theoretical perspective[J].International Journal of Robotics Research, 2014, 35(1/2/3):186-203.
[7] MA Y N, LI J W, CAO Z G, et al.Learning to iteratively solve routing problems with dual-aspect collaborative Transformer[EB/OL].[2021-10-28].https://arxiv.org/abs/2110.02544.
[8] WU Y X, SONG W, CAO Z G, et al.Learning improvement heuristics for solving routing problems[J].IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9):5057-5069.
[9] ZHANG L Y, HU T, MIN Y, et al.A taxi order dispatch model based on combinatorial optimization[C]//Proceedings of the 23rd International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2017:2151-2159.
[10] WEI Z Q, CHENG X T, YAO J, et al.Ride-hailing order dispatching at DiDi via reinforcement learning[J].INFORMS Journal on Applied Analytics, 2020, 50(5):272-286.
[11] LI J W, XIN L, CAO Z G, et al.Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning[J].IEEE Transactions on Intelligent Transportation Systems, 2022, 23(3):2306-2315.
[12] SUTTON R S, BARTO A G.Reinforcement learning:an introduction[M].Cambridge, USA:MIT Press, 2018.
[13] XU Z, LI Z X, GUAN Q W, et al.Large-scale order dispatch in on-demand ride-hailing platforms:a learning and planning approach[C]//Proceedings of the 24th International Conference on Knowledge Discovery & Data Mining.New York, USA:ACM Press, 2018:905-913.
[14] LI J W, MA Y N, GAO R Z, et al.Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem[J].IEEE Transactions on Cybernetics, 2021, 99:1-10.
[15] 邱月, 郑柏通, 蔡超.多约束复杂环境下UAV航迹规划策略自学习方法[J].计算机工程, 2021, 47(5):44-51. QIU Y, ZHENG B T, CAI C.Self-learning method of UAV track planning strategy in complex environment with multiple constraints[J].Computer Engineering, 2021, 47(5):44-51.(in Chinese)
[16] WANG Z D, QIN Z W, TANG X C, et al.Deep reinforcement learning with knowledge transfer for online rides order dispatching[C]//Proceedings of International Conference on Data Mining.Washington D.C., USA:IEEE Press, 2018:617-626.
[17] YANG Y D, LUO R, LIN M N, et al.Mean field multi-agent reinforcement learning[EB/OL].[2021-10-28].https://arxiv.org/pdf/1802.05438.pdf.
[18] AL-ABBASI A O, GHOSH A, AGGARWAL V.DeepPool:distributed model-free algorithm for ride-sharing using deep reinforcement learning[J].IEEE Transactions on Intelligent Transportation Systems, 2019, 20(12):4714-4727.
[19] JIAO Y, TANG X C, QIN Z W, et al.Real-world ride-hailing vehicle repositioning using deep reinforcement learning[J].Transportation Research Part C:Emerging Technologies, 2021, 130:103289.
[20] SON K, KIM D, KANG W J, et al.Qtran:learning to factorize with transformation for coop-erative multi agent reinforcement learning[EB/OL].[2021-10-28].https://arxiv.org/abs/1905.05408.
[21] ZHANG W Q, WANG Q, LI J J, et al.Dynamic fleet management with rewriting deep reinforcement learning[J].IEEE Access, 2020, 8:143333-143341.
[22] 雷捷维, 王嘉旸, 任航, 等.基于Expectimax搜索与Double DQN的非完备信息博弈算法[J].计算机工程, 2021, 47(3):304-310, 320. LEI J W, WANG J Y, REN H, et al.Incomplete information game algorithm based on Expectimax search and Double DQN[J].Computer Engineering, 2021, 47(3):304-310, 320.(in Chinese)
[23] MACIEJEWSKI M, NAGEL K.The influence of multi-agent cooperation on the efficiency of taxi dispatching[C]//Proceedings of International Conference on Parallel Processing and Applied Mathematics.Berlin, Germany:Springer, 2014:751-760.
[24] WEI C, WANG Y H, YAN X D, et al.Look-ahead insertion policy for a shared-taxi system based on reinforcement learning[J].IEEE Access, 2017, 6:5716-5726.
[25] JIN J, ZHOU M, ZHANG W, et al.Coride:joint order dispatching and fleet management for multi-scale ride-hailing platforms[C]//Proceedings of the 28th International Conference on Information and Knowledge Management. New York, USA:ACM Press, 2019:1983-1992.
[26] LIMA O D, SHAH H, CHU T S, et al.Efficient ridesharing dispatch using multi agent reinforcement learning[EB/OL].[2021-10-28].https://arxiv.org/abs/2006.10897.
[27] LI M N, QIN Z W, JIAO Y, et al.Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning[C]//Proceedings of Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.New York, USA:ACM Press, 2019:983-994.
[28] RASHID T, SAMVELYAN M, SCHROEDER C, et al.QMIX:monotonic value function factorization for deep multi agent reinforcement learning[EB/OL].[2021-10-28].https://arxiv.org/pdf/1803.11485.pdf.

选择文件类型/文献管理软件名称

选择包含的内容

基于联合Q值分解的强化学习网约车订单派送

Reinforcement Learning Online Car-Hailing Order Dispatch Based on Joint Q-value Decomposition

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	何杏宇, 周易歆, 罗东旭, 杨桂松. 基于图神经网络和多主体评价的教学资源推荐[J]. 计算机工程, 2024, 50(7): 13-22.
[2]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[3]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[4]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[5]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[6]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[7]	李云航, 潘晴, 田妮莉. 结构相似度优化的混合多尺度医学图像融合[J]. 计算机工程, 2024, 50(7): 264-270.
[8]	张正康, 杨丹, 聂铁铮, 寇月. 基于图结构聚类的自监督学习疾病诊断方法[J]. 计算机工程, 2024, 50(7): 360-371.
[9]	李亚康, 陈刚. 小角中子散射物理模型自动化筛选[J]. 计算机工程, 2024, 50(6): 56-64.
[10]	更藏措毛, 黄鹤鸣, 杨毅杰. 融合多尺度特征与上下文信息的语音增强方法[J]. 计算机工程, 2024, 50(6): 138-147.
[11]	于洋, 孙芳芳, 吕华, 李扬, 王晓民. 基于多尺度时空注意力网络的微表情检测方法[J]. 计算机工程, 2024, 50(6): 228-235.
[12]	宋庆增, 刘向东, 许康为, 刘佳辉, 任二祥, 骆丽, 魏琦, 乔飞. 基于卷积神经网络的逐级唤醒存内计算控制器设计[J]. 计算机工程, 2024, 50(6): 328-335.
[13]	高家豪, 胡创业, 丁男, 刘战东. 智能网联汽车中联合驾驶风格的交通流数据有效性分析[J]. 计算机工程, 2024, 50(6): 367-376.
[14]	孙文洁, 李宗民, 孙浩淼. 基于图神经网络的多智能体强化学习值函数分解方法[J]. 计算机工程, 2024, 50(5): 62-70.
[15]	游奔, 李晓红, 姚锦, 冯绍杰. 基于多粒度图与注意力机制的半监督短文本分类[J]. 计算机工程, 2024, 50(5): 83-90.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于联合Q值分解的强化学习网约车订单派送

Reinforcement Learning Online Car-Hailing Order Dispatch Based on Joint Q-value Decomposition

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献

相关文章 15

编辑推荐

Metrics

本文评价