作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (5): 33-42. doi: 10.19678/j.issn.1000-3428.0069478

• 空天地一体化算力网络 • 上一篇    下一篇

基于融合状态预测的深度强化学习A2C的交通信号控制

叶宝林1,2, 孙瑞涛1,2, 李灵犀3, 吴维敏4   

  1. 1. 嘉兴大学信息科学与工程学院, 浙江 嘉兴 314001;
    2. 浙江理工大学信息科学与工程学院, 浙江 杭州 310018;
    3. 普渡大学埃尔莫尔家族电气与计算机工程学院, 美国 西拉法叶 47907;
    4. 浙江大学智能系统与控制研究所工业控制技术国家重点实验室, 浙江 杭州 310027
  • 收稿日期:2024-03-04 修回日期:2024-07-25 出版日期:2025-05-15 发布日期:2025-05-10
  • 通讯作者: 叶宝林,E-mail:yebaolin@zjxu.edu.cn E-mail:yebaolin@zjxu.edu.cn
  • 基金资助:
    国家自然科学基金(61603154);浙江省自然科学基金(LTGS23F030002);浙江省尖兵领雁研发攻关计划项目(2023C01174);嘉兴市应用性基础研究项目(2023AY11034);工业控制技术国家重点实验室开放课题(ICT2022B52)。

Traffic Signal Control Based on Deep Reinforcement Learning A2C Integrated with State Prediction

YE Baolin1,2, SUN Ruitao1,2, LI Lingxi3, WU Weimin4   

  1. 1. School of Information Science and Engineering, Jiaxing University, Jiaxing 314001, Zhejiang, China;
    2. School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, Zhejiang, China;
    3. Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette 47907, USA;
    4. State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027, Zhejiang, China
  • Received:2024-03-04 Revised:2024-07-25 Online:2025-05-15 Published:2025-05-10

摘要: 现有基于强化学习的交通信号控制方法主要使用历史交通状态和当前时间步的实时交通状态来确定下一个时间步的控制策略,造成控制策略始终滞后于交通状态一个时间步。为了解决该问题,提出一种基于融合交通状态预测的深度强化学习优势演员评论家(A2C)的交通信号控制方法。首先,为了获取未来时间步的交通状态,以确保制定的控制策略能够更精准地响应实时交通状态下的决策需求,设计一个长短时记忆(LSTM)网络预测路网未来时间步的交通状态。然后,为了提高输入深度强化学习模型中数据的准确性和鲁棒性,设计一个卡尔曼滤波器对采集的历史交通状态数据和LSTM网络预测的未来交通状态数据进行融合。其次,为了使深度强化学习模型能够更全面地理解交通流量中包含的时间依赖关系,并实现更高效和稳定的交通信号控制决策,提出一种融合双向LSTM网络的A2C算法。最后,基于微观交通仿真(SUMO)平台的仿真测试结果表明,与传统交通信号控制方法和基于深度强化学习A2C的交通信号控制方法相比,该方法在低峰、平峰和高峰两种不同交通流量状态下均能够取得更好的交通信号控制效益。

关键词: 交通信号控制, 优势演员评论家, 交通状态预测, 双向长短时记忆网络

Abstract: Existing reinforcement learning-based traffic signal control methods primarily utilize historical and real-time traffic states at the current time step to determine the control strategy for the next time step. However, this approach results in the control strategy to lag behind the actual traffic state by one time step. To address this issue, this study proposes a traffic signal control method based on Advantage Actor Critic (A2C) using deep reinforcement learning. First, a Long Short-Term Memory (LSTM) network is designed to predict the future traffic states of a road network, to obtain the traffic state of future time steps and ensure that the formulated control strategy can respond more accurately to decision-making requirements under real-time traffic conditions. Second, a Kalman filter is designed to fuse collected historical traffic state data with the future traffic state data predicted by the LSTM, to improve the accuracy and robustness of the data being input into the deep reinforcement learning model. Additionally, a bidirectional LSTM-integrated A2C algorithm is proposed that allows the deep reinforcement learning model to fully capture the time-dependent relationships within traffic flow and achieve more efficient and stable traffic signal control decisions. Finally, simulations conducted on the Simulation of Urban Mobility (SUMO) platform demonstrate that the proposed method achieves superior traffic signal control efficiency under both low-peak, off-peak and peak traffic conditions compared to traditional traffic signal control methods and deep reinforcement learning A2C-based traffic signal control method.

Key words: traffic signal control, Advantage Actor Critic (A2C), traffic state prediction, bidirectional Long Short-Term Memory (LSTM) network

中图分类号: