作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (5): 83-92. doi: 10.19678/j.issn.1000-3428.0068958

• 人工智能与模式识别 • 上一篇    下一篇

基于多智能体深度强化学习的SD-IoT控制器部署

吕超峰1,2, 徐鹏飞1, 罗迪1, 刘金平1   

  1. 1. 湖南师范大学信息科学与工程学院, 湖南 长沙 410081;
    2. 张家界航空工业职业技术学院, 湖南 张家界 427000
  • 收稿日期:2023-12-06 修回日期:2024-02-22 出版日期:2025-05-15 发布日期:2024-05-09
  • 通讯作者: 徐鹏飞,E-mail:xupf@hunnu.edu.cn E-mail:xupf@hunnu.edu.cn
  • 基金资助:
    湖南省教育厅科学研究项目(23C1042)。

SD-IoT Controller Placement Based on Multi-Agent Deep Reinforcement Learning

Lü Chaofeng1,2, XU Pengfei1, LUO Di1, LIU Jinping1   

  1. 1. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, Hunan, China;
    2. Zhangjiajie Institute of Aeronautical Engineering, Zhangjiajie 427000, Hunan, China
  • Received:2023-12-06 Revised:2024-02-22 Online:2025-05-15 Published:2024-05-09

摘要: 物联网(IoT)中激增的流量,影响了传感器等设备的数据传输。利用软件定义网络(SDN)技术可以优化网络性能,提高数据传输质量。然而,物联网中流量等网络状态的不断变化会影响软件定义网络控制平面的性能。研究软件定义物联网(SD-IoT)中的动态控制器部署问题,以在流量变化时保证控制平面性能。考虑到物联网节点的能耗以及数据传输的特点,在部署控制器时,综合考虑延迟、控制可靠性以及能耗的影响,并将该问题构建为马尔可夫博弈过程。为了同时兼顾单一控制器性能以及控制平面整体性能,采用多智能体深度强化学习求解该问题。在部署阶段利用动作掩码屏蔽部分节点,避免将控制器部署在性能不足或者供能不方便的节点。仿真实验表明,与基于Louvain社区划分和基于单智能体深度Q网络(DQN)的部署算法相比,所提算法可以更好地找到高性能的部署方案。

关键词: 软件定义物联网, 控制器部署, 多智能体深度强化学习, 动作掩码, 马尔可夫博弈

Abstract: The rapid growth of Internet of Things (IoT) traffic has significantly impacted data transmission for devices such as sensors. Software-Defined Networking (SDN) offers a solution to optimize network performance and enhance data transmission quality. However, the dynamic nature of network states, such as traffic fluctuations in IoT environments, poses challenges to the performance of the control plane in SDN. This study addresses the dynamic controller placement problem in Software-Defined IoT (SD-IoT) to ensure consistent control plane performance under changing traffic conditions. The approach considers the energy consumption and data transmission characteristics of IoT nodes when deploying controllers, with a comprehensive evaluation of factors such as delay, control reliability, and energy consumption. The problem is modeled as a Markov game process to capture these dynamics effectively. To optimize both individual controller performance and the overall control plane performance, this study employs multi-agent deep reinforcement learning. During the deployment phase, action masks are utilized to exclude nodes with insufficient performance or limited power supply, ensuring robust and efficient controller placement. Simulation experiments demonstrate that the proposed algorithm identifies high-performance deployment solutions compared with the placement algorithms based on Louvain community division or single agent Deep Q-Network(DQN), achieving superior results in dynamic IoT environments.

Key words: Software-Defined Internet of Things (SD-IoT), controller placement, multi-agent deep reinforcement learning, action mask, Markov game

中图分类号: