作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于TD3算法的多智能体协作缓存策略

  • 发布日期:2024-04-18

Multi-agent Cooperative caching Strategy based on TD3 Algorithm

  • Published:2024-04-18

摘要: 在移动边缘网络中为了降低内容获取时延和传输开销,提出一种基于双延迟深度确定性策略梯度(TD3)的多智能体协作缓存算法(MACC)。首先,构建多智体边缘缓存模型,将多节点缓存替换问题建模为部分可观测马尔可夫过程(POMDP),把相邻节点的缓存状态和内容请求信息融入到各节点的观察空间,提高智能体对环境的感知能力,并通过三次指数平滑法提取各节点内容请求的流行度特征,使得算法能够适应内容流行度变化,从而提高缓存命中率;然后,联合本地与相邻节点的传输时延和开销来设计指导性奖励函数,引导智能体进行协作缓存,降低系统的缓存冗余和内容传输开销;最后,结合Wolpertinger Architecture方法对TD3算法进行多智体扩展,让每个边缘节点都能自适应的学习缓存策略,从而提高系统性能。实验结果表明,MACC算法中边缘节点牺牲了部分缓存空间协助相邻节点缓存请求内容从提高缓存命中率,在同一数据集上与MAAC、DDPG、独立TD3算法相比,MACC算法的缓存命中率分别平均提高了8.50%、13.91%、29.21%,并能适应动态的边缘环境实现较小的内容获取时延和传输开销。

Abstract: In order to reduce content acquisition delay and transmission overhead in mobile edge networks, a multi-agent cooperative caching algorithm (MACC) based on Twin Delayed Deep Deterministic policy gradient (TD3) is proposed. Firstly, a multi-agent edge cache model is constructed, and the multi-node cache replacement problem is modeled as a partially observable Markov process (POMDP). The cache state and content request information of adjacent nodes are integrated into the observation space of each node to improve the agent's ability to perceive the environment, and the prevalence characteristics of each node's content request are extracted by the three-times exponential smoothing method. The algorithm can adapt to the change in content popularity and improve the cache hit rate. Then, a guiding reward function is designed by combining the transmission delay and overhead of local and adjacent nodes to guide agents to cooperate in caching, reducing the cache redundancy and content transmission overhead of the system. Finally, the Wolpertinger Architecture method is combined to extend the TD3 algorithm with multiple agents, so that each edge node can learn the cache strategy adaptively, to improve the system performance. The experimental results show that edge nodes in the MACC algorithm sacrifice part of cache space to help neighboring nodes cache request content to improve the cache hit rate. Compared with MAAC, DDPG, and independent TD3 algorithms on the same data set, the cache hit rate of the MACC algorithm is respectively improved by 8.50%, 13.91%, and 29.21%. It can adapt to a dynamic edge environment to achieve small content acquisition delay and transmission overhead.