基于深度强化学习PPO的车辆智能控制方法

doi:10.19678/j.issn.1000-3428.0068889

摘要/Abstract

摘要：

为提高高速公路上混合环境下车辆的行驶效率、减少交通事故的发生，提出一种基于近端策略优化(PPO)的车辆智能控制方法。首先构建一个融合深度强化学习和传统比例-积分-微分(PID)控制的分层控制框架，上层深度强化学习智能体负责确定控制策略，下层PID控制器负责执行控制策略。其次为了提升车辆的行驶效率，通过定义优势距离对观测到的环境状态矩阵进行数据筛选，帮助自主车辆选择具有更长优势距离的车道进行变道。基于定义的优势距离提出一种新的状态采集方法以减少数据处理量, 加快深度强化学习模型的收敛速度。另外，为了兼顾车辆的安全性、行驶效率和稳定性，设计一个多目标奖励函数。最后在基于Gym搭建的车辆强化学习任务仿真环境Highway_env中进行测试，对所提方法在不同目标速度下的表现进行分析和讨论。仿真测试结果表明，相比深度Q网络(DQN)方法，所提方法具有更快的收敛速度，且在两种不同目标速度下均能使车辆安全平稳地完成驾驶任务。

关键词: 近端策略优化, 车辆控制, 分层控制框架, 多目标奖励函数, 深度Q网络

Abstract:

This study proposes a Proximal Policy Optimization (PPO)-based vehicle intelligence control method to improve vehicle driving efficiency in a mixed environment on highways and reduce traffic accidents. First, a hierarchical control framework is constructed, integrating deep reinforcement learning and traditional Proportional Integral Derivative (PID) control. The upper-level deep reinforcement learning agent is responsible for determining the control strategy, while the lower-level PID controller executes the control strategy. Second, an advantage distance is defined to filter the observed environmental state matrix to enhance driving efficiency, helping the ego-car to choose lanes with longer advantage distances for lane changing. A new state collection method is proposed based on the defined advantage distance to reduce the amount of data to be processed, to accelerate the convergence speed of the deep reinforcement learning model. Additionally, a multi-objective reward function is designed to balance the safety, driving efficiency, and stability of vehicles. Finally, simulation tests are conducted in a vehicle reinforcement learning task simulation environment called Highway_env, built on Gym. The proposed approach achieves a faster convergence rate than that of the Deep Q-Network (DQN) method. It also enables vehicles to safely and smoothly accomplish driving tasks at two different target speeds.

Key words: Proximal Policy Optimization(PPO), vehicle control, layered control framework, multi-objective reward function, Deep Q-Network(DQN)

叶宝林, 王欣, 李灵犀, 吴维敏. 基于深度强化学习PPO的车辆智能控制方法[J]. 计算机工程, 2025, 51(7): 385-396.

YE Baolin, WANG Xin, LI Lingxi, WU Weimin. Vehicle Intelligent Control Method Based on Deep Reinforcement Learning PPO[J]. Computer Engineering, 2025, 51(7): 385-396.

https://www.ecice06.com/CN/Y2025/V51/I7/385

图/表 17

图1 高速公路场景图

Fig.1 The diagram of highway scene

图2 强化学习示意图

Fig.2 The schematic diagram of reinforcement learning

图3 PPO算法逻辑结构

Fig.3 The logical structure of PPO algorithm

图4 车辆状态信息示意图

Fig.4 The schematic diagram of vehicle status information

图5 数据过滤示意图

Fig.5 The schematic diagram of data filtering

图6 车辆运动学模型

Fig.6 The vehicle kinematic model

图7 基于深度强化学习PPO的车辆智能控制方法框架

Fig.7 The framework of vehicle intelligent control method based on deep reinforcement learning PPO

图8 Highway_env仿真环境

Fig.8 Highway_env simulation environment

图9 DQN和PPO算法训练过程奖励值变化

Fig.9 The changes of reward value during the training process of DQN and PPO algorithms

图10 DQN和PPO算法训练过程碰撞率变化

Fig.10 The changes of collision rate during the training process of DQN and PPO algorithm

图11 不同目标速度下不同算法训练过程中的平均车速变化曲线

Fig.11 The change curves of the average speed of the vehicle under different target speeds during the training process of different algorithms

图12 以27.8 m/s(100 km/h)为目标速度时车辆变速的速度及加速度变化曲线

Fig.12 The change curves of the speed and acceleration of the vehicle during the process of the vehicle speed-changing when the target speed is set to 27.8 m/s (100 km/h)

图13 目标速度为27.8 m/s(100 km/h)时车辆变道过程中速度及加速度变化曲线

Fig.13 The change curves of velocity and acceleration of the vehicle during the process of the vehicle lane-changing when the target speed is set to 27.8 m/s(100 km/h)

图14 目标速度为33.4 m/s(120 km/h)时车辆变速过程中速度及加速度变化曲线

Fig.14 The change curves of the speed and acceleration of the vehicle during the process of the vehicle speed-the vehicle speed-changing when the target speed is 33.4 m/s(120 km/h)

图15 目标速度为33.4 m/s(120 km/h)时车辆变道过程中速度及加速度变化曲线

Fig.15 Velocity and acceleration curve during the process of the vehicle lane-changing when the target speed is 33.4 m/s(120 km/h)

参考文献 27

1	XU Q, LI K Q, WANG J Q, et al. The status, challenges, and trends: an interpretation of technology roadmap of intelligent and connected vehicles in China. Journal of Intelligent and Connected Vehicles, 2022, 5(1): 1- 7.
2	ZHANG Q D, ZHANG T R, MA L. Human acceptance of autonomous vehicles: research status and prospects. International Journal of Industrial Ergonomics, 2023, 95, 103458.
3	CHEN Y B, LI L X. Advances in intelligent vehicles. [S. 1. ]: Academic Press, 2014.
4	YE B L, WU W M, RUAN K Y, et al. A survey of model predictive control methods for traffic signal control. CAA Journal of Automatica Sinica, 2019, 6(3): 623- 640.
5	KUUTTI S, BOWDEN R, JIN Y C, et al. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(2): 712- 733.
6	李克强, 李家文, 常雪阳, 等. 智能网联汽车云控系统原理及其典型应用. 汽车安全与节能学报, 2020, 11(3): 261- 275.
	LI K Q, LI J W, CHANG X Y, et al. Principles and typical applications of cloud control system for intelligent and connected vehicles. Journal of Automotive Safety and Engergy, 2020, 11(3): 261- 275.
7	LEVINE S, FINN S, DARREL T, et al. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 2015, 17(1): 1334- 1373.
8	LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with large-scale data collection[C]//Proceedings of 2016 International Symposium on Experimental Robotics. Berlin, Germany: Springer, 2017: 173-184.
9	RAUSCH V, HANSEN A, SOLOWJOW E, et al. Learning a deep neural net policy for end-to-end control of autonomous vehicles[C]//Proceedings of 2017 American Control Conference. Washington D. C., USA: IEEE Press, 2017: 4914-4919.
10	KUUTTI S, BOWDEN R, JIN Y C, et al. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(2): 712- 733.
11	SYAVASYA C V S R, MUDDANA A L. Optimization of autonomous vehicle speed control mechanisms using hybrid DDPG-SHAP-DRL-stochastic algorithm. Advances in Engineering Software, 2022, 173, 103245.
12	KIRAN B R, SOBH I, TALPAERT V, et al. Deep reinforcement learning for autonomous driving: a survey. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 4909- 4926.
13	DESJARDINS C, CHAIB-DRAA B. Cooperative adaptive cruise control: a reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 2021, 12(4): 1248- 1260.
14	WANG P, CHAN C Y, FORTELLE A D L. A reinforcement learning based approach for automated lane change maneuvers[C]//Proceeding of IEEE Symposium on Intelligent Vehicle. Washington D. C., USA: IEEE Press, 2018: 1379-1384.
15	ALZUBAIDI A, AL SUMAITI A S, BYON Y J, et al. Emergency vehicle aware lane change decision model for autonomous vehicles using deep reinforcement learning. IEEE Access, 2023, 11, 27127- 27137.
16	YANG F, LI X Y, LIU Q, et al. Filling action selection reinforcement learning algorithm for safer autonomous driving in multi-traffic scenes[C]//Proceedings of IEEE Intelligent Vehicles Symposium. Washington D. C., USA: IEEE Press, 2023: 1-7.
17	LIAO J D, LIU T, TANG X L, et al. Decision-making strategy on highway for autonomous vehicles using deep reinforcement learning. IEEE Access, 2020, 8, 177804- 177814.
18	ZHAO D B, HU Z H, XIA Z P, et al. Full-range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing, 2014, 125, 57- 67.
19	WANG B, ZHAO D B, LI C D, et al. Design and implementation of an adaptive cruise control system based on supervised actor-critic learning[C]//Proceedings of the 5th International Conference on Information Science and Technology. Washington D. C., USA: IEEE Press, 2015: 243-248.
20	HE Y X, LIU Y, YANG L, et al. Deep adaptive control: deep reinforcement learning-based adaptive vehicle trajectory control algorithms for different risk levels. IEEE Transactions on Intelligent Vehicles, 2023, 9(1): 1654- 1666.
21	KENDALL A, HAWKE J, JANZ D, et al. Learning to drive in a day[C]//Proceedings of International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 2019: 8248-8254.
22	XIANG Y, WEN J Y, LUO W G, et al. Research on collision-free control and simulation of single-agent based on an improved DDPG algorithm[C]//Proceedings of the 35th Youth Academic Annual Conference of Chinese Association of Automation. Washington D. C., USA: IEEE Press, 2020: 552-556.
23	CHEN X C, WEI J Q, REN X Q, et al. Automatic overtaking on two-way roads with vehicle interactions based on proximal policy optimization[C]//Proceedings of IEEE Intelligent Vehicles Symposium. Washington D. C., USA: IEEE Press, 2021: 1057-1064.
24	MUZAHID A J M, KAMARULZAMAN S F, RAHMAN M A. Comparison of PPO and SAC algorithms towards decision making strategies for collision avoidance among multiple autonomous vehicles[C]//Proceedings of International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management. Washington D. C., USA: IEEE Press, 2021: 200-205.
25	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2023-10-20]. https://arxiv.org/abs/1707.06347.
26	HE X K, LV C. Toward personalized decision making for autonomous vehicles: a constrained multi-objective reinforcement learning technique. Transportation Research Part C: Emerging Technologies, 2023, 156, 104352.
27	CHEN D, JIANG L S, WANG Y, et al. Autonomous driving using safe reinforcement learning by incorporating a regret-based human lane-changing decision model[C]//Proceedings of 2020 American Control Conference. Washington D. C., USA: IEEE Press, 2020: 4355-4361.

[1]	李斌, 山慧敏. 空地算力网络中的异构资源协同优化[J]. 计算机工程, 2025, 51(5): 1-8.
[2]	李淑怡, 阳波, 陈灵, 沈玲, 唐文胜. 自适应奖励函数的PPO曲面覆盖方法[J]. 计算机工程, 2025, 51(3): 86-94.
[3]	张俊娜, 李天泽, 赵晓焱, 袁培燕. 一种基于DQN的去中心化优先级卸载策略[J]. 计算机工程, 2024, 50(9): 235-245.
[4]	张德城, 刘毅志, 赵肄江, 廖祝华. 面向GPS数据的出租车载客路线层次化推荐模型[J]. 计算机工程, 2024, 50(12): 163-173.
[5]	刘航博, 马礼, 李阳, 马东超, 傅颖勋. 无人驾驶中运用DQN进行障碍物分类的避障方法[J]. 计算机工程, 2024, 50(11): 380-389.
[6]	刘国名, 李彩虹, 李永迪, 张国胜, 张耀玉, 高腾腾. 基于改进PPO算法的机器人局部路径规划[J]. 计算机工程, 2023, 49(2): 119-126,135.
[7]	饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐[J]. 计算机工程, 2023, 49(2): 279-287,295.
[8]	李奇儒, 耿霞. 基于改进DQN算法的机器人路径规划[J]. 计算机工程, 2023, 49(12): 111-120.
[9]	李凌书, 邬江兴. 面向云网融合SaaS安全的虚拟网络功能映射方法[J]. 计算机工程, 2021, 47(12): 30-39.
[10]	陈建平, 周鑫, 傅启明, 高振, 付保川, 吴宏杰. 基于二阶时序差分误差的双网络DQN算法[J]. 计算机工程, 2020, 46(5): 78-85,93.
[11]	吴思凡, 杜煜, 徐世杰, 杨硕, 杜晨. 基于深度确定性策略梯度的智能车汇流模型[J]. 计算机工程, 2020, 46(1): 87-92.
[12]	乔良,鲍泓,玄祖兴,梁军,潘峰. 基于强化学习的无人驾驶匝道汇入模型[J]. 计算机工程, 2018, 44(7): 20-24,31.

选择文件类型/文献管理软件名称

选择包含的内容