作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (5): 117-128. doi: 10.19678/j.issn.1000-3428.0070326

• 计算智能与模式识别 • 上一篇    下一篇

基于改进DQN的最优联盟结构生成策略优化

赵庶旭, 周宏泽*(), 王小龙   

  1. 兰州交通大学电子与信息工程学院, 甘肃 兰州 730070
  • 收稿日期:2024-09-05 修回日期:2024-10-15 出版日期:2026-05-15 发布日期:2026-05-12
  • 通讯作者: 周宏泽
  • 作者简介:

    赵庶旭, 男, 教授、博士, 主研方向为智能交通、边缘计算

    周宏泽(通信作者), 硕士研究生

    王小龙, 博士

  • 基金资助:
    甘肃省重点研发计划(20YF8GA123)

Optimization of Optimal Coalition Structure Generation Strategy Based on Improved DQN

ZHAO Shuxu, ZHOU Hongze*(), WANG Xiaolong   

  1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, Gansu, China
  • Received:2024-09-05 Revised:2024-10-15 Online:2026-05-15 Published:2026-05-12
  • Contact: ZHOU Hongze

摘要:

边缘服务器往往需要在资源有限的情况下通过组建联盟的方式协同执行任务, 考虑服务器的资源利用率随任务执行而动态变化的特性, 如何确保任务能够尽快完成的同时减少重构联盟耗费的成本是一大难点。针对上述问题, 提出一种基于双重深度Q网络(DDQN)优化的联盟结构优化策略。首先, 以最大化任务完成效率以及最小化联盟构建成本为优化目标, 通过定义状态空间、动作空间和奖励函数, 将问题建模为引入成本的马尔可夫决策过程(CT-MDP)。其次, 针对CT-MDP中高维状态空间下易出现Q值过高估计的问题, 提出一种基于DDQN的轻量化最优联盟结构搜索算法, 通过两套独立的Q网络减少更新过程中的正向累计误差。为满足边缘设备在训练过程中对资源占用率的严格要求, 对激活函数进行优化以降低训练模型对存储资源的需求。最后, 通过仿真实验将所提算法与Q-learning、DQN、Dueling DQN等算法进行比较分析。实验结果表明, 所提方法具有良好的收敛性和稳定性, 且在联盟构建成本与资源占用率方面分别降低了20.36%与12.12%, 证明了该方法的有效性。

关键词: 人工智能, 移动边缘计算, 计算资源受限, 资源调度, 联盟结构生成, 深度强化学习

Abstract:

Edge servers often need to collaborate to execute tasks by forming alliances when resources are limited. Ensuring that tasks can be completed as quickly as possible while reducing the cost of restructuring alliances is a major challenge considering the dynamic changes in server resource utilization for task execution. A coalition structure optimization strategy based on dual Deep Q-Network (DDQN) optimization is proposed to address these issues. First, with the optimization objective of maximizing task completion efficiency and minimizing alliance-building costs, the problem is modeled as a Cost Introduced Markov Decision Process (CT-MDP) by defining the state space, action space, and reward function. Second, in response to the problem of overestimating Q-values in high-dimensional state spaces in the CT-MDP, a lightweight optimal alliance structure search algorithm based on DDQN is proposed. Two independent Q-networks are used to reduce the forward cumulative error during the update process. To satisfy the strict requirements of the edge devices for resource utilization during training, the activation function is optimized to reduce the storage resource requirements of the training model. Finally, the proposed algorithm is compared with Q-learning, DQN, Dueling DQN, and other algorithms using simulation experiments. The results show that the proposed method has good convergence and stability, and it reduces alliance construction costs and resource utilization by 20.36% and 12.12%, respectively, demonstrating the effectiveness of the method.

Key words: Artificial Intelligence (AI), Mobile Edge Computing (MEC), limited computing resources, resource scheduling, coalition structure generation, Deep Reinforcement Learning (DRL)