基于改进DQN的最优联盟结构生成策略优化

doi:10.19678/j.issn.1000-3428.0070326

摘要/Abstract

摘要：

边缘服务器往往需要在资源有限的情况下通过组建联盟的方式协同执行任务, 考虑服务器的资源利用率随任务执行而动态变化的特性, 如何确保任务能够尽快完成的同时减少重构联盟耗费的成本是一大难点。针对上述问题, 提出一种基于双重深度Q网络(DDQN)优化的联盟结构优化策略。首先, 以最大化任务完成效率以及最小化联盟构建成本为优化目标, 通过定义状态空间、动作空间和奖励函数, 将问题建模为引入成本的马尔可夫决策过程(CT-MDP)。其次, 针对CT-MDP中高维状态空间下易出现Q值过高估计的问题, 提出一种基于DDQN的轻量化最优联盟结构搜索算法, 通过两套独立的Q网络减少更新过程中的正向累计误差。为满足边缘设备在训练过程中对资源占用率的严格要求, 对激活函数进行优化以降低训练模型对存储资源的需求。最后, 通过仿真实验将所提算法与Q-learning、DQN、Dueling DQN等算法进行比较分析。实验结果表明, 所提方法具有良好的收敛性和稳定性, 且在联盟构建成本与资源占用率方面分别降低了20.36%与12.12%, 证明了该方法的有效性。

关键词: 人工智能, 移动边缘计算, 计算资源受限, 资源调度, 联盟结构生成, 深度强化学习

Abstract:

Edge servers often need to collaborate to execute tasks by forming alliances when resources are limited. Ensuring that tasks can be completed as quickly as possible while reducing the cost of restructuring alliances is a major challenge considering the dynamic changes in server resource utilization for task execution. A coalition structure optimization strategy based on dual Deep Q-Network (DDQN) optimization is proposed to address these issues. First, with the optimization objective of maximizing task completion efficiency and minimizing alliance-building costs, the problem is modeled as a Cost Introduced Markov Decision Process (CT-MDP) by defining the state space, action space, and reward function. Second, in response to the problem of overestimating Q-values in high-dimensional state spaces in the CT-MDP, a lightweight optimal alliance structure search algorithm based on DDQN is proposed. Two independent Q-networks are used to reduce the forward cumulative error during the update process. To satisfy the strict requirements of the edge devices for resource utilization during training, the activation function is optimized to reduce the storage resource requirements of the training model. Finally, the proposed algorithm is compared with Q-learning, DQN, Dueling DQN, and other algorithms using simulation experiments. The results show that the proposed method has good convergence and stability, and it reduces alliance construction costs and resource utilization by 20.36% and 12.12%, respectively, demonstrating the effectiveness of the method.

Key words: Artificial Intelligence (AI), Mobile Edge Computing (MEC), limited computing resources, resource scheduling, coalition structure generation, Deep Reinforcement Learning (DRL)

赵庶旭, 周宏泽, 王小龙. 基于改进DQN的最优联盟结构生成策略优化[J]. 计算机工程, 2026, 52(5): 117-128.

ZHAO Shuxu, ZHOU Hongze, WANG Xiaolong. Optimization of Optimal Coalition Structure Generation Strategy Based on Improved DQN[J]. Computer Engineering, 2026, 52(5): 117-128.

https://www.ecice06.com/CN/Y2026/V52/I5/117

图/表 11

图1 系统架构

Fig.1 System architecture

图2 DDQN算法框架

Fig.2 The framework of DDQN algorithm

图3 不同学习率下的DDQN训练表现

Fig.3 DDQN training performance at different learning rates

图4 不同最小经验回放缓冲区尺寸下的DDQN训练表现

Fig.4 DDQN training performance under different minimum experience replay buffer sizes

图5 DDQN算法和DQN算法训练中Q值的变化

Fig.5 Changes in Q values during training of DDQN and DQN algorithms

图6 不同算法的收敛性对比

Fig.6 Comparison of convergence of different algorithms

图7 不同任务数量下DQN和DDQN的算法性能

Fig.7 Performance of DQN and DDQN algorithms under different task quantities

图8 不同任务数量下DQN和DDQN的回报值对比

Fig.8 Comparison of return values between DQN and DDQN under different task counts

图9 4种激活函数下的训练曲线

Fig.9 Training curves under four activation functions

图10 不同核数下CPU内存占用率情况

Fig.10 CPU memory usage under different numbers of cores

参考文献 24

1	LIU X L , YU J D , WANG J , et al. Resource allocation with edge computing in IoT networks via machine learning. IEEE Internet of Things Journal, 2020, 7 (4): 3415- 3426. doi: 10.1109/JIOT.2020.2970110
2	SATYANARAYANAN M . The emergence of edge computing. Computer, 2017, 50 (1): 30- 39.
3	刘亮, 毛武平, 李汶蔚, 等. 空天地一体化边缘计算网络中基于博弈论的任务卸载策略. 计算机工程, 2025, 51 (2): 238- 249. doi: 10.19678/j.issn.1000-3428.0069161
	LIANG L, MAO W P, LI W W, et al. Task offloading strategy based on game theory in air-ground-space integrated edge computing networks. Computer Engineering, 2025, 51 (2): 238- 249. doi: 10.19678/j.issn.1000-3428.0069161
4	WANG X L , DANG J W , ZHAO S X , et al. Coalition structure generation in edge computing environment with multitasking concurrency. IEEE Internet of Things Journal, 2023, 10 (5): 4324- 4338. doi: 10.1109/JIOT.2022.3217171
5	赵庶旭, 韦萍, 王小龙. 多任务并发边缘计算环境中最优联盟结构生成策略. 通信学报, 2023, 44 (2): 172- 184.
	ZHAO S X, WEI P, WANG X L. Optimal coalition structure generation strategy in multi-task concurrent edge computing environment. Journal on Communications, 2023, 44 (2): 172- 184.
6	LI X M , WAN J F , DAI H N , et al. A hybrid computing solution and resource scheduling strategy for edge computing in smart manufacturing. IEEE Transactions on Industrial Informatics, 2019, 15 (7): 4225- 4234. doi: 10.1109/TII.2019.2899679
7	刘惊雷, 童向荣, 张伟. 一种快速构建最优联盟结构的方法. 计算机工程与应用, 2006, 42 (4): 35-37, 44.
	LIU J L, TONG X R, ZHANG W. A kind of method for quick constructing optimal coalition structure. Computer Engineering & Application, 2006, 42 (4): 35-37, 44.
8	RAHWAN T, JENNINGS N R. An improved dynamic programming algorithm for coalition structure generation[C]//Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. New York, USA: ACM Press, 2008: 1417-1420.
9	GRECO G , LUPIA F , SCARCELLO F . Coalitional games induced by matching problems: complexity and islands of tractability for the Shapley value. Artificial Intelligence, 2020, 278, 103180. doi: 10.1016/j.artint.2019.103180
10	THRALL R M , LUCAS W F . N-person games in partition function form. Naval Research Logistics Quarterly, 1963, 10 (1): 281- 298. doi: 10.1002/nav.3800100126
11	HU Y N , LI C S , ZHANG K J . A method of searching for optimal coalition structure for solving resource scheduling problem of overall load balancing in edge computing environments. Journal of Physics: Conference Series, 2020, 1550 (3): 032080. doi: 10.1088/1742-6596/1550/3/032080
12	ZHANG K J , HU Y N , TIAN F , et al. A coalition-structure's generation method for solving cooperative computing problems in edge computing environments. Information Sciences, 2020, 536, 372- 390.
13	DING S Y, LIN D H. A coalitional Markov decision process model for dynamic coalition formation among agents[C]//Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Melbourne, Australia: IEEE Press, 2020: 308-315.
14	DING S Y , LIN D H . Deep coalitional Q-learning for dynamic coalition formation in edge computing. IEICE Transactions on Information and Systems, 2022, E105.D (5): 864- 872. doi: 10.1587/transinf.2021KBP0007
15	ZENG D Z , GU L , PAN S L , et al. Resource management at the network edge: a deep reinforcement learning approach. IEEE Network, 2019, 33 (3): 26- 33. doi: 10.1109/MNET.2019.1800386
16	ALFAKIH T , HASSAN M M , GUMAEI A , et al. Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA. IEEE Access, 2020, 8, 54074- 54084. doi: 10.1109/ACCESS.2020.2981434
17	LI J L , GU C H , XIANG Y , et al. Edge-cloud computing systems for smart grid: state-of-the-art, architecture, and applications. Journal of Modern Power Systems and Clean Energy, 2022, 10 (4): 805- 817. doi: 10.35833/MPCE.2021.000161
18	SHANI G , HECKERMAN D , BRAFMAN R I , et al. An MDP-based recommender system. Journal of Machine Learning Research, 2005, 6 (9): 1265- 1295.
19	CHEN X , JIAO L , LI W Z , et al. Efficient multi-user computation offloading for mobile-edge cloud computing. ACM Transactions on Networking, 2016, 24 (5): 2795- 2808. doi: 10.1109/TNET.2015.2487344
20	CAO X F , TANG G M , GUO D K , et al. Edge federation: towards an integrated service provisioning model. ACM Transactions on Networking, 2020, 28 (3): 1116- 1129.
21	WANG C , LEI S B , JU P , et al. MDP-based distribution network reconfiguration with renewable distributed generation: approximate dynamic programming approach. IEEE Transactions on Smart Grid, 2020, 11 (4): 3620- 3631. doi: 10.1109/TSG.2019.2963696
22	ARULKUMARAN K , DEISENROTH M P , BRUNDAGE M , et al. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 2017, 34 (6): 26- 38.
23	KRIZHEVSKY A , HINTON G . Convolutional deep belief networks on CIFAR-10. Unpublished Manuscript, 2010, 40 (7): 1- 9.
24	EVEN-DAR E , MANSOUR Y , BARTLETT P . Learning rates for Q-learning. Journal of Machine Learning Research, 2003, 5 (12): 1- 25.

[1]	王怡, 覃团发, 韦睿, 黄金宝. SAG-MEC网络下支持WPT的无人机动态任务卸载与资源分配[J]. 计算机工程, 2026, 52(5): 371-382.
[2]	李明明, 潘子豪. 基于改进TD3算法的移动机器人路径规划[J]. 计算机工程, 2026, 52(5): 150-159.
[3]	刘义, 罗淳, 钟伟锋, 余意, 欧智清. 多无人机自适应合作任务卸载决策[J]. 计算机工程, 2026, 52(4): 339-348.
[4]	李斌, 郭毅. 面向异构多背包问题的深度强化学习算法[J]. 计算机工程, 2026, 52(4): 140-162.
[5]	王雯, 杨奎武, 仝松松, 魏江宏, 薛岩, 周荣魁. 深度神经网络模型水印攻击研究[J]. 计算机工程, 2026, 52(4): 22-38.
[6]	廖勇, 韩小金, 刘金林, 汪浩. 可解释人工智能研究进展[J]. 计算机工程, 2026, 52(3): 41-61.
[7]	王利民, 朱光辉, 吴涛. 大模型技术演进：世界模型让人工智能从感知走向决策(特邀)[J]. 计算机工程, 2026, 52(2): 1-6.
[8]	王兴杰, 王侃, 费蓉, 王怀军, 郭银波, 兰大鹏, 朱晓杰. 卫星边缘网络中基于扩散模型的算力分配策略[J]. 计算机工程, 2026, 52(1): 346-355.
[9]	赵季红, 臧若雨, 刘振. 面向卫星车载MEC网络的协同计算卸载方法[J]. 计算机工程, 2025, 51(9): 49-58.
[10]	崔萌萌, 施静燕, 项昊龙. 基于空地协同的动态车载边缘任务卸载方法[J]. 计算机工程, 2025, 51(9): 25-37.
[11]	秦敏浩, 孙未未. 基于隐状态预测的失真交通信号灯路口控制策略[J]. 计算机工程, 2025, 51(9): 1-13.
[12]	陈彦如, 刘珂良, 冉茂亮. 基于深度强化学习的外卖即时配送实时优化[J]. 计算机工程, 2025, 51(9): 328-339.
[13]	刘娟, 胡雪莲, 朱美蓉, 冯梦兰, 温素月. AIGC支持下教学动画中虚拟教师的应用效果研究[J]. 计算机工程, 2025, 51(8): 341-353.
[14]	王浩, 高锦涛, 王杰. 基于机器学习的数据库多表连接顺序选择研究综述[J]. 计算机工程, 2025, 51(7): 31-46.
[15]	亓明凯, 王迪, 张立晔. 基于分层强化学习的在线三维装箱模型[J]. 计算机工程, 2025, 51(6): 136-145.

选择文件类型/文献管理软件名称

选择包含的内容