作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (9): 34-43. doi: 10.19678/j.issn.1000-3428.0059295

• 热点与综述 • 上一篇    下一篇

基于SAC强化学习的车联网频谱资源动态分配

黄煜梵1, 彭诺蘅1, 林艳1, 范建存2, 张一晋1, 余妍秋1   

  1. 1. 南京理工大学 电子工程与光电技术学院, 南京 210094;
    2. 西安交通大学 信息与通信工程学院, 西安 710049
  • 收稿日期:2020-08-18 修回日期:2020-11-08 发布日期:2021-09-13
  • 作者简介:黄煜梵(1998-),女,硕士研究生,主研方向为物联网资源分配、网络安全;彭诺蘅,硕士研究生;林艳(通信作者),讲师;范建存,副教授、博士生导师;张一晋,教授;余妍秋,硕士研究生。
  • 基金资助:
    国家自然科学基金(62001225,62071236);中央高校基本科研业务费专项资金(30920021127,30919011227);江苏省自然科学青年基金(BK20190454)。

Dynamic Spectrum Resource Allocation in Internet of Vehicles Based on SAC Reinforcement Learning

HUANG Yufan1, PENG Nuoheng1, LIN Yan1, FAN Jiancun2, ZHANG Yijin1, YU Yanqiu1   

  1. 1. School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;
    2. School of Information and Communications Engineering, Xi'an Jiaotong University, Xi'an 710049, China
  • Received:2020-08-18 Revised:2020-11-08 Published:2021-09-13

摘要: 针对车联网频谱资源稀缺问题,提出一种基于柔性致动-评价(SAC)强化学习算法的多智能体频谱资源动态分配方案。以最大化信道总容量与载荷成功交付率为目标,建立车辆-车辆(V2V)链路频谱资源分配模型。将每条V2V链路作为单个智能体,构建多智能体马尔科夫决策过程模型。利用SAC强化学习算法设计神经网络,通过最大化熵与累计奖励和以训练智能体,使得V2V链路经过不断学习优化频谱资源分配。仿真结果表明,与基于深度Q网络和深度确定性策略梯度的频谱资源分配方案相比,该方案可以更高效地完成车联网链路之间的频谱共享任务,且信道传输速率和载荷成功交付率更高。

关键词: 车联网, 资源分配, 多智能体强化学习, 柔性致动-评价算法, 频谱分配

Abstract: To address the scarcity of spectrum resources in Internet of Vehicles(IoV), a novel multi-agent dynamic spectrum allocation solution based on Soft Actor-Critic(SAC) reinforcement learning is proposed.The solution aims to maximize the total channel capacity and the success rate of payload delivery.To achieve this goal, a spectrum resource allocation model consisting of Vehicle-to-Vehicle(V2V) links is constructed.Each V2V link is regarded as an agent to model this problem as a Markov decision process.Then the SAC reinforcement learning algorithm is used to design a neural network.The agents are trained by maximum entropy and cumulative reward, so the V2V links can optimize the allocation of spectrum resources through rounds of learning.Simulation results show that compared with spectrum resource allocation scheme based on Deep Q-Network(DQN) and Deep Deterministic Policy Gradient(DDPG), the proposed scheme can more efficiently implement spectrum sharing between V2V links, and improves the channel transmission rate and the success rate of payload delivery.

Key words: Internet of Vehicles(IoV), resource allocation, multi-agent reinforcement learning, Soft Actor-Critic(SAC) algorithm, spectrum allocation

中图分类号: