Traffic Signal Control Based on Deep Reinforcement Learning A2C Integrated with State Prediction

doi:10.19678/j.issn.1000-3428.0069478

Abstract

Abstract:

Existing reinforcement learning-based traffic signal control methods primarily utilize historical and real-time traffic states at the current time step to determine the control strategy for the next time step. However, this approach results in the control strategy to lag behind the actual traffic state by one time step. To address this issue, this study proposes a traffic signal control method based on Advantage Actor Critic (A2C) using deep reinforcement learning. First, a Long Short-Term Memory (LSTM) network is designed to predict the future traffic states of a road network, to obtain the traffic state of future time steps and ensure that the formulated control strategy can respond more accurately to decision-making requirements under real-time traffic conditions. Second, a Kalman filter is designed to fuse collected historical traffic state data with the future traffic state data predicted by the LSTM, to improve the accuracy and robustness of the data being input into the deep reinforcement learning model. Additionally, a bidirectional LSTM-integrated A2C algorithm is proposed that allows the deep reinforcement learning model to fully capture the time-dependent relationships within traffic flow and achieve more efficient and stable traffic signal control decisions. Finally, simulations conducted on the Simulation of Urban Mobility (SUMO) platform demonstrate that the proposed method achieves superior traffic signal control efficiency under both low-peak, off-peak and peak traffic conditions compared to traditional traffic signal control methods and deep reinforcement learning A2C-based traffic signal control method.

Key words: traffic signal control, Advantage Actor Critic (A2C), traffic state prediction, bidirectional Long Short-Term Memory (LSTM) network

摘要：

现有基于强化学习的交通信号控制方法主要使用历史交通状态和当前时间步的实时交通状态来确定下一个时间步的控制策略, 造成控制策略始终滞后于交通状态一个时间步。为了解决该问题, 提出一种基于融合交通状态预测的深度强化学习优势演员评论家(A2C)的交通信号控制方法。首先, 为了获取未来时间步的交通状态, 以确保制定的控制策略能够更精准地响应实时交通状态下的决策需求, 设计一个长短时记忆(LSTM)网络预测路网未来时间步的交通状态。然后, 为了提高输入深度强化学习模型中数据的准确性和鲁棒性, 设计一个卡尔曼滤波器对采集的历史交通状态数据和LSTM网络预测的未来交通状态数据进行融合。其次, 为了使深度强化学习模型能够更全面地理解交通流量中包含的时间依赖关系, 并实现更高效和稳定的交通信号控制决策, 提出一种融合双向LSTM网络的A2C算法。最后, 基于微观交通仿真(SUMO)平台的仿真测试结果表明, 与传统交通信号控制方法和基于深度强化学习A2C的交通信号控制方法相比, 该方法在低峰、平峰和高峰两种不同交通流量状态下均能够取得更好的交通信号控制效益。

关键词: 交通信号控制, 优势演员评论家, 交通状态预测, 双向长短时记忆网络

YE Baolin, SUN Ruitao, LI Lingxi, WU Weimin. Traffic Signal Control Based on Deep Reinforcement Learning A2C Integrated with State Prediction[J]. Computer Engineering, 2025, 51(5): 33-42.

叶宝林, 孙瑞涛, 李灵犀, 吴维敏. 基于融合状态预测的深度强化学习A2C的交通信号控制[J]. 计算机工程, 2025, 51(5): 33-42.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069478

https://www.ecice06.com/EN/Y2025/V51/I5/33

Figures/Tables 14

Fig.1 Schematic diagram of road network

Fig.2 Schematic diagram of single intersection sampling state

Fig.3 Schematic diagram of action space

Fig.4 Schematic diagram of multiple intersection sampling state

Fig.5 State prediction process based on LSTM

Fig.6 Deep reinforcement learning A2C model integrated with bidirectional LSTM

Fig.7 Schematic diagram of reinforcement learning A2C model integrated with traffic state prediction

Fig.8 Low-peak traffic flow

Fig.9 Off-peak traffic flow

Fig.10 Peak traffic flow

References 25

1	杨笑笑, 柯琳, 陈智斌. 深度强化学习求解车辆路径问题的研究综述. 计算机工程与应用, 2023, 59 (5): 1- 13.
	YANG X X , KE L , CHEN Z B . Review of deep reinforcement learning model research on vehicle routing problems. Computer Engineering and Applications, 2023, 59 (5): 1- 13.
2	ZUO Z Y , HAN Q L , NING B D , et al. An overview of recent advances in fixed-time cooperative control of multiagent systems. IEEE Transactions on Industrial Informatics, 2018, 14 (6): 2322- 2334. doi: 10.1109/TII.2018.2817248
3	LOPEZ A , JIN W L , AL FARUQUE M A . Security analysis for fixed-time traffic control systems. Transportation Research, Part B: Methodological, 2020, 139, 473- 495. doi: 10.1016/j.trb.2020.07.002
4	ZHAO D B , DAI Y J , ZHANG Z . Computational intelligence in urban traffic signal control: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012, 42 (4): 485- 494.
5	屈新明, 姚红云, 王玉刚, 等. 基于有效绿灯时间利用率的自适应控制策略研究. 交通运输研究, 2015, 1 (1): 54- 58.
	QU X M , YAO H Y , WANG Y G , et al. Adaptive control strategy based on effective utilization ratio of green light time. Transport Research, 2015, 1 (1): 54- 58.
6	LI S R , WEI C , YAN X D , et al. A deep adaptive traffic signal controller with long-term planning horizon and spatial-temporal state definition under dynamic traffic fluctuations. IEEE Access, 2020, 8, 37087- 37104. doi: 10.1109/ACCESS.2020.2974885
7	叶宝林. 城市路网交通信号协调控制理论与方法研究[D]. 杭州: 浙江大学, 2015.
	YE B L. Study on the theory and method of traffic signal coordinated control in urban road network[D]. Hangzhou: Zhejiang University, 2015. (in Chinese)
8	BELLOMO N , DOGBE C . On the modeling of traffic and crowds: a survey of models, speculations, and perspectives. SIAM Review, 2011, 53 (3): 409- 463. doi: 10.1137/090746677
9	ZHENG Y , JIN L S , JIANG Y Y , et al. Research on cooperative vehicle intersection control scheme without using traffic lights under the connected vehicles environment. Advances in Mechanical Engineering, 2017, 9 (8): 115789974.
10	HAYES C F , RǍDULESCU R , BARGIACCHI E , et al. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 2022, 36 (1): 26. doi: 10.1007/s10458-022-09552-y
11	YE B L , ZHU S W , LI L X , et al. Short-term traffic flow prediction at isolated intersections based on parallel multi-task learning. Systems Science & Control Engineering, 2024, 12 (1): 2316160.
12	WAN C H , HWANG M C . Value-based deep reinforcement learning for adaptive isolated intersection signal control. IET Intelligent Transport Systems, 2018, 12 (9): 1005- 1010. doi: 10.1049/iet-its.2018.5170
13	马东方, 陈曦, 吴晓东, 等. 基于强化学习的干线信号混合协同优化方法. 交通运输系统工程与信息, 2022, 22 (2): 145- 153.
	MA D F , CHEN X , WU X D , et al. Mixed-coordinated decision-making method for arterial signals based on reinforcement learning. Journal of Transportation Systems Engineering and Information Technology, 2022, 22 (2): 145- 153.
14	YE B L , WU W M , RUAN K Y , et al. A survey of model predictive control methods for traffic signal control. IEEE/CAA Journal of Automatica Sinica, 2019, 6 (3): 623- 640. doi: 10.1109/JAS.2019.1911471
15	YE B L , WU W M , LI L X , et al. A hierarchical model predictive control approach for signal splits optimization in large-scale urban road networks. IEEE Transactions on Intelligent Transportation Systems, 2016, 17 (8): 2182- 2192. doi: 10.1109/TITS.2016.2517079
16	MA W J , WAN L J , YU C H , et al. Multi-objective optimization of traffic signals based on vehicle trajectory data at isolated intersections. Transportation Research, Part C: Emerging Technologies, 2020, 120, 102821. doi: 10.1016/j.trc.2020.102821
17	孔凌辉, 饶哲恒, 徐彦彦, 等. 基于深度强化学习的无线网络智能路由算法. 计算机工程, 2023, 49 (9): 199-207, 216.
	KONG L H , RAO Z H , XU Y Y , et al. Intelligent routing algorithm for wireless networks based on deep reinforcement learning. Computer Engineering, 2023, 49 (9): 199-207, 216.
18	刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述. 智能科学与技术学报, 2020, 2 (4): 314- 326.
	LIU Z Y , MU C X , SUN C Y . An overview on algorithms and applications of deep reinforcement learning. Chinese Journal of Intelligent Science and Technology, 2020, 2 (4): 314- 326.
19	刘智敏, 叶宝林, 朱耀东, 等. 基于深度强化学习的交通信号控制方法. 浙江大学学报(工学版), 2022, 56 (6): 1249- 1256.
	LIU Z M , YE B L , ZHU Y D , et al. Traffic signal control method based on deep reinforcement learning. Journal of Zhejiang University (Engineering Science), 2022, 56 (6): 1249- 1256.
20	MA D F , ZHOU B , SONG X , et al. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (8): 11789- 11800. doi: 10.1109/TITS.2021.3107258
21	叶宝林, 孙瑞涛, 吴维敏, 等. 基于异步优势演员-评论家的交通信号控制方法. 浙江大学学报(工学版), 2024, 58 (8): 1671-1680, 1703.
	YE B L , SUN R T , WU W M , et al. Traffic signal control method based on asynchronous advantage actor-critic. Journal of Zhejiang University (Engineering Science), 2024, 58 (8): 1671-1680, 1703.
22	YE B L , WU P , LI L X , et al. Uniformity of Markov elements in deep reinforcement learning for traffic signal control. Electronic Research Archive, 2024, 32 (6): 3843- 3866. doi: 10.3934/era.2024174
23	张尊栋, 王岩楠, 刘雨珂, 等. 基于Nash-Stackelberg分层博弈模型的路网交通控制强化学习算法. 东南大学学报(自然科学版), 2023, 53 (2): 334- 341.
	ZHANG Z D , WANG Y N , LIU Y K , et al. Road network traffic control reinforcement learning algorithms based on Nash-Stackelberg hierarchical game model. Journal of Southeast University (Natural Science Edition), 2023, 53 (2): 334- 341.
24	陈喜群, 朱奕璋, 吕朝锋. 基于混合近端策略优化的交叉口信号相位与配时优化方法. 交通运输系统工程与信息, 2023, 23 (1): 106- 113.
	CHEN X Q , ZHU Y Z , LV C F . Signal phase and timing optimization method for intersection based on hybrid proximal policy optimization. Journal of Transportation Systems Engineering and Information Technology, 2023, 23 (1): 106- 113.
25	YE B L , WU W M , MAO W J . A two-way arterial signal coordination method with queueing process considered. IEEE Transactions on Intelligent Transportation Systems, 2015, 16 (6): 3440- 3452. doi: 10.1109/TITS.2015.2461493

[1]	Yueyue CHU, Fei YAN, Pu LI. Urban Traffic Signal Predictive Control Method with Iterative Learning Compensation for Disturbances [J]. Computer Engineering, 2023, 49(7): 305-312.
[2]	YANG Jin, ISHIDA Toru, SU Dao. Distributed Traffic Signal Intelligence Control Model Based on Non-hierarchical MAS [J]. Computer Engineering, 2006, 32(13): 240-242.

Please choose a citation manager

Content to export