Multi-Objective Multicast Routing Algorithm Based on Multi-Step Reinforcement Learning

doi:10.19678/j.issn.1000-3428.0068573

Abstract

Abstract:

Current networks suffer from over-provisioning, redundancy, and congestion, leading to high energy consumption and reduced user satisfaction. The multicast routing problem, which jointly optimizes energy consumption and delay, is a NP-complete problem. A multi-objective multicast routing algorithm based on multi-step Q-Learning is proposed to solve the delay- and energy-consuming multicast routing problem in a Software Defined Network (SDN) architecture. The algorithm aims to reduce the energy consumption and delay of the network while satisfying the network performance and Quality of Service (QoS) requirements. The algorithm is based on multi-step Q-Learning, which can more accurately estimate the long-term rewards for each path. This, in turn, can select optimal actions for nodes by updating the Q-value at each step, and ultimately find the best path. By combining the reward and value functions of multiple time steps, faster convergence to the optimal strategy is possible. In addition, when setting the reward values, different weights are assigned to each objective, which are used to balance the weights occupied by the objectives. Simulation results show that the algorithm can effectively reduce network energy consumption and delay, and improve network performance compared with existing representative algorithms.

Key words: multicast routing, reinforcement learning, multi-objective optimization, energy consumption, delay

摘要：

当前网络中存在过度供应、冗余和拥塞等问题, 导致能耗过高和用户满意度下降。联合优化能耗和延迟的组播路由问题是一个NP完全问题。在软件定义网络(SDN)架构下, 提出一种基于多步Q-Learning的多目标组播路由算法, 以解决延迟和能耗的组播路由问题。该算法旨在降低网络能耗和延迟, 同时满足网络性能和服务质量(QoS)的要求。基于多步Q-Learning, 准确估计每条路径的长期奖励, 通过在每个步骤中更新Q值, 为节点选择最优的动作, 并最终找到最佳路径。通过将多个时间步的奖励和价值函数相结合, 更快地收敛到最优策略。此外, 在设置奖励值时, 为每一个目标赋予不同的权重, 用来平衡目标所占的比重。仿真结果表明, 与现有的代表性算法相比, 该算法能够有效降低网络能耗和延迟, 提高网络性能。

关键词: 组播路由, 强化学习, 多目标优化, 能耗, 延迟

TIAN Jinwei, LI Xiaole, QIN Yao, WANG Cuiping, WANG Hua. Multi-Objective Multicast Routing Algorithm Based on Multi-Step Reinforcement Learning[J]. Computer Engineering, 2025, 51(6): 275-285.

田金玮, 李晓乐, 秦尧, 王翠平, 王华. 基于多步强化学习的多目标组播路由算法[J]. 计算机工程, 2025, 51(6): 275-285.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068573

https://www.ecice06.com/EN/Y2025/V51/I6/275

Figures/Tables 18

Fig.1 Simulation results of routing for 30 nodes

Fig.2 Simulation results of routing for 50 nodes

Fig.3 Simulation results of routing for 100 nodes

Fig.4 Comparison of average energy consumption for four groups of transmission requirements

Fig.5 Comparison of average delay for four groups of transmission requirements

Fig.6 Comparison of convergence of two algorithms under 30 nodes

Fig.7 Comparison of convergence of two algorithms under 50 nodes

Fig.8 Comparison of convergence of two algorithms under 100 nodes

Fig.9 Comparison of convergence of two algorithms under different network sizes

Fig.10 The variation of hypervolume values with iteration times

Fig.11 Comparison of computing time of five algorithms under different network sizes

References 28

1	陈若宾, 王兴伟, 马连博, 等. 绿色主干网络中一种高效的节能路由算法. 计算机学报, 2018, 41 (11): 2612- 2623.
	CHEN R B , WANG X W , MA L B , et al. An energy-efficient routing algorithm in green networks. Chinese Journal of Computers, 2018, 41 (11): 2612- 2623.
2	ZHANG M J , YANG W G , GAO S X , et al. Network energy-saving adjustment routing under changing demands: models and algorithms. IEEE Access, 2020, 8, 90676- 90685. doi: 10.1109/ACCESS.2020.2993890
3	MALEKI A, HOSSAIN M, GEORGES J P, et al. An SDN perspective to mitigate the energy consumption of core networks-GEANT2[EB/OL]. [2023-08-05]. https://www.researchgate.net/publication/319876305_An_SDN_Perspective_to_Mitigate_the_Energy_Consumption_of_Core_Networks_-_GEANT2.
4	ZHAO Y, LEE J. A reinforcement learning based low-delay scheduling with adaptive transmission[C]//Proceedings of the International Conference on Information and Communication Technology Convergence. Washington D.C., USA: IEEE Press, 2019: 916-919.
5	叶和元, 韩俐, 孙士民. SDN中基于蚁群优化的网络测量节点选择算法. 计算机工程, 2022, 48 (5): 191- 199. URL
	YE H Y , HAN L , SUN S M . Network measurement node selection algorithm based on ant colony optimization in SDN. Computer Engineering, 2022, 48 (5): 191- 199. URL
6	YAN J X, WANG H, LI X L, et al. Multi-objective disaster backup in inter-datacenter using reinforcement learning[EB/OL]. [2023-08-05]. https://link.springer.com/chapter/10.1007/978-3-030-59016-1_49.
7	LI X L . An efficient data evacuation strategy using multi-objective reinforcement learning. Applied Intelligence, 2022, 52 (7): 7498- 7512. doi: 10.1007/s10489-021-02640-8
8	王玉, 王文灿, 白丽, 等. 基于链路生存时间预测的高动态飞行自组网组播路由协议. 计算机工程, 2021, 47 (11): 198- 206. URL
	WANG Y , WANG W C , BAI L , et al. Multicast routing protocol based on link lifetime prediction for high dynamic FANETs. Computer Engineering, 2021, 47 (11): 198- 206. URL
9	GAREY M R , GRAHAM R L , JOHNSON D S . The complexity of computing steiner minimal trees. SIAM Journal on Applied Mathematics, 1977, 32 (4): 835- 859. doi: 10.1137/0132072
10	周灵, 孙亚民. 基于MPH的时延约束Steiner树算法. 计算机研究与发展, 2008, 45 (5): 810- 816.
	ZHOU L , SUN Y M . A delay-constrained Steiner tree algorithm using MPH. Journal of Computer Research and Development, 2008, 45 (5): 810- 816.
11	SAHOO S P , KABAT M R . The multi-constrained multicast routing improved by hybrid bacteria foraging-particle swarm optimization. Computer Science, 2019, 20 (2): 245. doi: 10.7494/csci.2019.20.2.3131
12	MURUGESWARI R, KUMAR K A, ALAGARSAMY S. An improved hybrid discrete PSO with GA for efficient QoS multicast routing[C]//Proceedings of the 5th International Conference on Electronics, Communication and Aerospace Technology. Washington D.C., USA: IEEE Press, 2021: 609-614.
13	YAO Z , WANG Y , QIU X S . DQN-based energy-efficient routing algorithm in software-defined data centers. International Journal of Distributed Sensor Networks, 2020, 16 (6): 155- 160.
14	YI S , LI X , WANG H , et al. Energy-aware disaster backup among cloud datacenters using multi-objective reinforcement learning in software defined network. Concurrency and Computation: Practice and Experience, 2022, 34 (3): e6588.
15	燕嘉鑫. 基于强化学习的数据中心灾难备份多目标优化机制研究[D]. 济南: 山东大学, 2022.
	YAN J X. Research on multi-objective optimization mechanism of data center disaster backup based on reinforcement learning[D]. Jinan: Shandong University, 2022. (in Chinese)
16	BOZORGCHENANI A , MASHHADI F , TARCHI D , et al. Multi-objective computation sharing in energy and delay constrained mobile edge computing environments. IEEE Transactions on Mobile Computing, 2020, 20 (10): 2992- 3005.
17	YU M X, WANG C P, LIU H T, et al. An energy-aware network routing algorithm based on Q-Learning[C]//Proceedings of the International Conference on High Performance Big Data and Intelligent Systems. Washington D.C., USA: IEEE Press, 2022: 254-258.
18	TRAN T N , NGUYEN T V , SHIM K , et al. A deep reinforcement learning-based QoS routing protocol exploiting cross-layer design in cognitive radio mobile Ad Hoc networks. IEEE Transactions on Vehicular Technology, 2022, 71 (12): 13165- 13181.
19	ZHOU Q , LI J , SHUAI B , et al. Multi-step reinforcement learning for model-free predictive energy management of an electrified off-highway vehicle. Applied Energy, 2019, 255, 113755.
20	LI X L , WANG H , YI S W , et al. Cost-efficient disaster backup for multiple data centers using capacity-constrained multicast. Concurrency and Computation: Practice and Experience, 2019, 31 (17): e5266.
21	许洪. 基于蚁群算法的组播路由优化与仿真[D]. 济南: 山东大学, 2010.
	XU H. Optimization and simulation of multicast routing based on ant colony algorithm[D]. Jinan: Shandong University, 2010. (in Chinese)
22	马朋委. Q learning强化学习算法的改进及应用研究[D]. 淮南: 安徽理工大学, 2016.
	MA P W. Improvement and application of Q learning reinforcement learning algorithm[D]. Huainan: Anhui University of Science and Technology, 2016. (in Chinese)
23	杨世贵, 王媛媛, 刘韦辰, 等. 基于强化学习的温度感知多核任务调度. 软件学报, 2021, 32 (8): 2408- 2424.
	YANG S G , WANG Y Y , LIU W C , et al. Temperature-aware task scheduling on multicores based on reinforcement learning. Journal of Software, 2021, 32 (8): 2408- 2424.
24	周灵, 王建新. 路径节点驱动的低代价最短路径树算法. 计算机研究与发展, 2011, 48 (5): 721- 728.
	ZHOU L , WANG J X . Path nodes-driven least-cost shortest path tree algorithm. Journal of Computer Research and Development, 2011, 48 (5): 721- 728.
25	ZHU R S, WANG H, GAO Y Q, et al. Energy saving and load balancing for SDN based on multi-objective particle swarm optimization[EB/OL]. [2023-08-05]. https://link.springer.com/chapter/10.1007/978-3-319-27137-8_14.
26	TSITSIKLIS J N . Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994, 16 (3): 185- 202.
27	VAN MOFFAERT K , NOWÉ A . Multi-objective reinforcement learning using sets of Pareto dominating policies. Journal of Machine Learning Research, 2014, 15 (1): 3483- 3512.
28	QIN Y , WANG H , YI S W , et al. Virtual machine placement based on multi-objective reinforcement learning. Applied Intelligence, 2020, 50 (8): 2370- 2383.

[1]	CHEN Jingdong, LUO Yuqiang. Design of Secure and Resilient Fuzzy Controller under Dynamic Network Scheduling [J]. Computer Engineering, 2025, 51(6): 212-222.
[2]	YANG Libin, ZHAN Cheng, LI Tingting, LIAO Jingrui. Energy-Efficient Resource Allocation Strategies in Multi-Antenna UAV Video Communication [J]. Computer Engineering, 2025, 51(6): 266-274.
[3]	LIU Jianhang, ZHOU Xiang, LI Shibao, CUI Xuerong. Cooperative Perception Message Transmission Control Scheme Based on Information Value [J]. Computer Engineering, 2025, 51(6): 245-254.
[4]	QI Mingkai, WANG Di, ZHANG Liye. Online 3D Bin Packing Model Based on Hierarchical Reinforcement Learning [J]. Computer Engineering, 2025, 51(6): 136-145.
[5]	SHEN Sitong, WANG Yaowu, XIE Zaipeng, TANG Bin. Role Learning-based Multi-Agent Reinforcement Learning Methods [J]. Computer Engineering, 2025, 51(6): 102-115.
[6]	LÜ Chaofeng, XU Pengfei, LUO Di, LIU Jinping. SD-IoT Controller Placement Based on Multi-Agent Deep Reinforcement Learning [J]. Computer Engineering, 2025, 51(5): 83-92.
[7]	WANG Kewen, ZHANG Weiting, SUN Tong. Resource Scheduling Mechanism for Space-Air-Ground Integrated Computing Power Networks [J]. Computer Engineering, 2025, 51(5): 52-61.
[8]	WU Kaifeng, LIU Lei, LIU Chen, LIANG Chengqing. Unmanned Aerial Vehicle Formation Control Based on MADDPG with Integrated Curriculum Learning [J]. Computer Engineering, 2025, 51(5): 73-82.
[9]	HUANG Siyang, CAI Ruichu, QIAO Jie, HAO Zhifeng. Causal Reinforcement Learning Algorithm Based on Causal Mask [J]. Computer Engineering, 2025, 51(4): 66-74.
[10]	LI Shuyi, YANG Bo, CHEN Ling, SHEN Ling, TANG Wensheng. Surface Coverage Method Based on PPO with Adaptive Reward Function [J]. Computer Engineering, 2025, 51(3): 86-94.
[11]	LI Siyuan, ZHONG Xingyu, LI Kaiyin, XU Qingzhen. Strategy Teaching Research Based on Multilayer Graph Relationship and Reinforcement Learning [J]. Computer Engineering, 2025, 51(3): 122-130.
[12]	LIN Shaofu, CHEN Yingying, LI Shuopeng. Method of Joint Optimization for Multi-UAV Energy Transfer and Edge Computing Based on Deep Reinforcement Learning [J]. Computer Engineering, 2025, 51(3): 144-154.
[13]	ZENG Jianzhou, LI Zeping, ZHANG Suqin. Multi-agent Cooperative Caching Strategy Based on TD3 Algorithm [J]. Computer Engineering, 2025, 51(2): 365-374.
[14]	SUN Haomiao, LI Zongmin, XIAO Qian, SUN Wenjie, ZHANG Wenxin. AI-Curling: An On-Site Curling Analysis and Decision-Making Method [J]. Computer Engineering, 2025, 51(2): 102-110.
[15]	CHEN Hao, CHEN Jun, LIU Fei. Research on Path Planning of Mobile Robots Based on Autonomous Exploration [J]. Computer Engineering, 2025, 51(1): 60-70.

Please choose a citation manager

Content to export