作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (7): 385-396. doi: 10.19678/j.issn.1000-3428.0068889

• 开发研究与工程应用 • 上一篇    

基于深度强化学习PPO的车辆智能控制方法

叶宝林1,2,*(), 王欣1,2, 李灵犀3, 吴维敏4   

  1. 1. 浙江理工大学信息科学与工程学院, 浙江 杭州 310018
    2. 嘉兴大学信息科学与工程学院, 浙江 嘉兴 314001
    3. 印第安纳大学-普渡大学印第安纳波利斯分校电子与计算机工程系, 美国 印第安纳波利斯 46202
    4. 浙江大学智能系统与控制研究所, 浙江 杭州 310027
  • 收稿日期:2023-11-22 出版日期:2025-07-15 发布日期:2024-06-19
  • 通讯作者: 叶宝林
  • 基金资助:
    嘉兴市应用性基础研究项目(2023AY11034); 浙江省自然科学基金(LTGS23F030002); 浙江省“尖兵”“领雁”研发攻关计划项目(2023C01174); 国家自然科学基金(61603154); 工业控制技术国家重点实验室开放课题(ICT2022B52)

Vehicle Intelligent Control Method Based on Deep Reinforcement Learning PPO

YE Baolin1,2,*(), WANG Xin1,2, LI Lingxi3, WU Weimin4   

  1. 1. School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, Zhejiang, China
    2. School of Information Science and Engineering, Jiaxing University, Jiaxing 314001, Zhejiang, China
    3. Department of Electrical and Computer Engineering, Purdue School of Engineering and Technology, Indiana University Purdue University Indianapolis, Indianapolis 46202, USA
    4. Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027, Zhejiang, China
  • Received:2023-11-22 Online:2025-07-15 Published:2024-06-19
  • Contact: YE Baolin

摘要:

为提高高速公路上混合环境下车辆的行驶效率、减少交通事故的发生,提出一种基于近端策略优化(PPO)的车辆智能控制方法。首先构建一个融合深度强化学习和传统比例-积分-微分(PID)控制的分层控制框架,上层深度强化学习智能体负责确定控制策略,下层PID控制器负责执行控制策略。其次为了提升车辆的行驶效率,通过定义优势距离对观测到的环境状态矩阵进行数据筛选,帮助自主车辆选择具有更长优势距离的车道进行变道。基于定义的优势距离提出一种新的状态采集方法以减少数据处理量, 加快深度强化学习模型的收敛速度。另外,为了兼顾车辆的安全性、行驶效率和稳定性,设计一个多目标奖励函数。最后在基于Gym搭建的车辆强化学习任务仿真环境Highway_env中进行测试,对所提方法在不同目标速度下的表现进行分析和讨论。仿真测试结果表明,相比深度Q网络(DQN)方法,所提方法具有更快的收敛速度,且在两种不同目标速度下均能使车辆安全平稳地完成驾驶任务。

关键词: 近端策略优化, 车辆控制, 分层控制框架, 多目标奖励函数, 深度Q网络

Abstract:

This study proposes a Proximal Policy Optimization (PPO)-based vehicle intelligence control method to improve vehicle driving efficiency in a mixed environment on highways and reduce traffic accidents. First, a hierarchical control framework is constructed, integrating deep reinforcement learning and traditional Proportional Integral Derivative (PID) control. The upper-level deep reinforcement learning agent is responsible for determining the control strategy, while the lower-level PID controller executes the control strategy. Second, an advantage distance is defined to filter the observed environmental state matrix to enhance driving efficiency, helping the ego-car to choose lanes with longer advantage distances for lane changing. A new state collection method is proposed based on the defined advantage distance to reduce the amount of data to be processed, to accelerate the convergence speed of the deep reinforcement learning model. Additionally, a multi-objective reward function is designed to balance the safety, driving efficiency, and stability of vehicles. Finally, simulation tests are conducted in a vehicle reinforcement learning task simulation environment called Highway_env, built on Gym. The proposed approach achieves a faster convergence rate than that of the Deep Q-Network (DQN) method. It also enables vehicles to safely and smoothly accomplish driving tasks at two different target speeds.

Key words: Proximal Policy Optimization(PPO), vehicle control, layered control framework, multi-objective reward function, Deep Q-Network(DQN)