Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (6): 275-285. doi: 10.19678/j.issn.1000-3428.0068573

• Mobile Internet and Communication Technology • Previous Articles     Next Articles

Multi-Objective Multicast Routing Algorithm Based on Multi-Step Reinforcement Learning

TIAN Jinwei1, LI Xiaole1,2, QIN Yao3,*(), WANG Cuiping1, WANG Hua2   

  1. 1. School of Information Science and Engineering, Linyi University, Linyi 276000, Shandong, China
    2. School of Software, Shandong University, Jinan 250101, Shandong, China
    3. Shanghai Police College, Shanghai 200137, China
  • Received:2023-10-15 Online:2025-06-15 Published:2024-05-22
  • Contact: QIN Yao

基于多步强化学习的多目标组播路由算法

田金玮1, 李晓乐1,2, 秦尧3,*(), 王翠平1, 王华2   

  1. 1. 临沂大学信息科学与工程学院, 山东 临沂 276000
    2. 山东大学软件学院, 山东 济南 250101
    3. 上海公安学院, 上海 200137
  • 通讯作者: 秦尧
  • 基金资助:
    山东省自然科学基金面上项目(ZR2023MF090); 山东省自然科学基金面上项目(ZR2023MF062); 云南省科技厅科技计划项目重大科技专项计划(202302AD080006)

Abstract:

Current networks suffer from over-provisioning, redundancy, and congestion, leading to high energy consumption and reduced user satisfaction. The multicast routing problem, which jointly optimizes energy consumption and delay, is a NP-complete problem. A multi-objective multicast routing algorithm based on multi-step Q-Learning is proposed to solve the delay- and energy-consuming multicast routing problem in a Software Defined Network (SDN) architecture. The algorithm aims to reduce the energy consumption and delay of the network while satisfying the network performance and Quality of Service (QoS) requirements. The algorithm is based on multi-step Q-Learning, which can more accurately estimate the long-term rewards for each path. This, in turn, can select optimal actions for nodes by updating the Q-value at each step, and ultimately find the best path. By combining the reward and value functions of multiple time steps, faster convergence to the optimal strategy is possible. In addition, when setting the reward values, different weights are assigned to each objective, which are used to balance the weights occupied by the objectives. Simulation results show that the algorithm can effectively reduce network energy consumption and delay, and improve network performance compared with existing representative algorithms.

Key words: multicast routing, reinforcement learning, multi-objective optimization, energy consumption, delay

摘要:

当前网络中存在过度供应、冗余和拥塞等问题, 导致能耗过高和用户满意度下降。联合优化能耗和延迟的组播路由问题是一个NP完全问题。在软件定义网络(SDN)架构下, 提出一种基于多步Q-Learning的多目标组播路由算法, 以解决延迟和能耗的组播路由问题。该算法旨在降低网络能耗和延迟, 同时满足网络性能和服务质量(QoS)的要求。基于多步Q-Learning, 准确估计每条路径的长期奖励, 通过在每个步骤中更新Q值, 为节点选择最优的动作, 并最终找到最佳路径。通过将多个时间步的奖励和价值函数相结合, 更快地收敛到最优策略。此外, 在设置奖励值时, 为每一个目标赋予不同的权重, 用来平衡目标所占的比重。仿真结果表明, 与现有的代表性算法相比, 该算法能够有效降低网络能耗和延迟, 提高网络性能。

关键词: 组播路由, 强化学习, 多目标优化, 能耗, 延迟