作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (2): 365-374. doi: 10.19678/j.issn.1000-3428.0068639

• 开发研究与工程应用 • 上一篇    下一篇

基于TD3算法的多智能体协作缓存策略

曾建州*(), 李泽平, 张素勤   

  1. 贵州大学计算机科学与技术学院公共大数据国家重点实验室, 贵州 贵阳 550025
  • 收稿日期:2023-10-19 出版日期:2025-02-15 发布日期:2024-04-18
  • 通讯作者: 曾建州
  • 基金资助:
    国家自然科学基金(61462014)

Multi-agent Cooperative Caching Strategy Based on TD3 Algorithm

ZENG Jianzhou*(), LI Zeping, ZHANG Suqin   

  1. State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2023-10-19 Online:2025-02-15 Published:2024-04-18
  • Contact: ZENG Jianzhou

摘要:

为了降低移动边缘网络中的内容获取时延和传输开销, 提出一种基于双延迟深度确定性策略梯度(TD3)的多智能体协作缓存策略(MACC)。首先构建多智能体边缘缓存模型, 将多节点缓存替换问题建模为部分可观测马尔可夫决策过程(POMDP), 把相邻节点的缓存状态和内容请求信息融入到各节点的观察空间, 提高智能体对环境的感知能力, 并通过三次指数平滑法提取各节点内容请求的流行度特征, 使得算法能够适应内容流行度变化, 从而提高缓存命中率; 然后联合本地与相邻节点的传输时延和开销来设计指导性奖励函数, 引导智能体进行协作缓存, 降低系统的缓存冗余和内容传输开销; 最后结合Wolpertinger Architecture方法对TD3算法进行多智能体扩展, 使每个边缘节点都能自适应地学习缓存策略, 从而提高系统性能。实验结果表明, MACC算法中边缘节点牺牲了部分缓存空间来协助相邻节点缓存请求内容, 从而提高缓存命中率, 在同一数据集上与MAAC、DDPG、独立TD3算法相比, MACC算法的缓存命中率分别平均提高了8.50%、13.91%和29.21%, 并能适应动态的边缘环境, 实现较小的内容获取时延和传输开销。

关键词: 移动边缘网络, 多智能体, 协作缓存, 深度强化学习, TD3算法

Abstract:

To reduce content-acquisition delay and transmission overhead in mobile edge networks, a Multi-Agent Cooperative Caching algorithm(MACC) based on a Twin Delayed Deep Deterministic(TD3) policy gradient is proposed. First, a multi-agent edge cache model is constructed and the multinode cache-replacement problem is modeled as a Partially Observable Markov Decision Process(POMDP). The cache state and content request information of adjacent nodes are integrated into the observation space of each node to improve the agent's ability to perceive the environment, and the prevalence characteristics of each node's content request are extracted using the triple exponential smoothing method. This algorithm can adapt to changes in the content popularity and improve the cache hit rate. Subsequently, a guiding reward function is designed by combining the transmission delay and overhead of local and adjacent nodes to guide agents to cooperate in caching, thereby reducing the cache redundancy and content transmission overhead of the system. Finally, the Wolpertinger architecture method is combined to extend the TD3 algorithm with multiple agents such that each edge node can learn the cache strategy adaptively to improve the system performance. Experimental results show that the edge nodes in the MACC algorithm use a portion of the cache space to help neighboring nodes cache request content and improve the cache hit rate. Compared with the MAAC, DDPG, and independent TD3 algorithms on the same dataset, the cache hit rate of the MACC algorithm improved by 8.50%, 13.91%, and 29.21%, respectively. The proposed algorithm can adapt to a dynamic edge environment to reduce the content-acquisition delay and transmission overhead.

Key words: mobile edge network, multi-agent, cooperative cache, deep reinforcement learning, TD3 algorithm