基于TD3算法的多智能体协作缓存策略

doi:10.19678/j.issn.1000-3428.0068639

摘要/Abstract

摘要：

为了降低移动边缘网络中的内容获取时延和传输开销, 提出一种基于双延迟深度确定性策略梯度(TD3)的多智能体协作缓存策略(MACC)。首先构建多智能体边缘缓存模型, 将多节点缓存替换问题建模为部分可观测马尔可夫决策过程(POMDP), 把相邻节点的缓存状态和内容请求信息融入到各节点的观察空间, 提高智能体对环境的感知能力, 并通过三次指数平滑法提取各节点内容请求的流行度特征, 使得算法能够适应内容流行度变化, 从而提高缓存命中率; 然后联合本地与相邻节点的传输时延和开销来设计指导性奖励函数, 引导智能体进行协作缓存, 降低系统的缓存冗余和内容传输开销; 最后结合Wolpertinger Architecture方法对TD3算法进行多智能体扩展, 使每个边缘节点都能自适应地学习缓存策略, 从而提高系统性能。实验结果表明, MACC算法中边缘节点牺牲了部分缓存空间来协助相邻节点缓存请求内容, 从而提高缓存命中率, 在同一数据集上与MAAC、DDPG、独立TD3算法相比, MACC算法的缓存命中率分别平均提高了8.50%、13.91%和29.21%, 并能适应动态的边缘环境, 实现较小的内容获取时延和传输开销。

关键词: 移动边缘网络, 多智能体, 协作缓存, 深度强化学习, TD3算法

Abstract:

To reduce content-acquisition delay and transmission overhead in mobile edge networks, a Multi-Agent Cooperative Caching algorithm(MACC) based on a Twin Delayed Deep Deterministic(TD3) policy gradient is proposed. First, a multi-agent edge cache model is constructed and the multinode cache-replacement problem is modeled as a Partially Observable Markov Decision Process(POMDP). The cache state and content request information of adjacent nodes are integrated into the observation space of each node to improve the agent's ability to perceive the environment, and the prevalence characteristics of each node's content request are extracted using the triple exponential smoothing method. This algorithm can adapt to changes in the content popularity and improve the cache hit rate. Subsequently, a guiding reward function is designed by combining the transmission delay and overhead of local and adjacent nodes to guide agents to cooperate in caching, thereby reducing the cache redundancy and content transmission overhead of the system. Finally, the Wolpertinger architecture method is combined to extend the TD3 algorithm with multiple agents such that each edge node can learn the cache strategy adaptively to improve the system performance. Experimental results show that the edge nodes in the MACC algorithm use a portion of the cache space to help neighboring nodes cache request content and improve the cache hit rate. Compared with the MAAC, DDPG, and independent TD3 algorithms on the same dataset, the cache hit rate of the MACC algorithm improved by 8.50%, 13.91%, and 29.21%, respectively. The proposed algorithm can adapt to a dynamic edge environment to reduce the content-acquisition delay and transmission overhead.

Key words: mobile edge network, multi-agent, cooperative cache, deep reinforcement learning, TD3 algorithm

曾建州, 李泽平, 张素勤. 基于TD3算法的多智能体协作缓存策略[J]. 计算机工程, 2025, 51(2): 365-374.

ZENG Jianzhou, LI Zeping, ZHANG Suqin. Multi-agent Cooperative Caching Strategy Based on TD3 Algorithm[J]. Computer Engineering, 2025, 51(2): 365-374.

https://www.ecice06.com/CN/Y2025/V51/I2/365

图/表 7

图1 分布式边缘网络模型

Fig.1 Distributed edge network model

图2 多智能体边缘缓存系统

Fig.2 Multi-agent edge cache system

图3 结合Wolpertinger Architecture的TD3模型

Fig.3 TD3 model combined Wolpertinger Architecture

图4 Movielens前300个时间段内的内容请求数量

Fig.4 Number of content requests for Movielens in the first 300 time periods

图5 不同平滑参数下缓存命中率和平均传输开销的变化

Fig.5 Changes in cache hit rate and average transmission cost under different smoothing parameters

图6 累积缓存命中次数与时间的关系

Fig.6 The relationship between cumulative cache hits and time

图7 各缓存策略在不同边缘缓存容量下的性能

Fig.7 Performance of each caching strategies at different edge cache capacities

参考文献 24

1	LIU D , CHEN B Q , YANG C Y , et al. Caching at the wireless edge: design aspects, challenges, and future directions. IEEE Communications Magazine, 2016, 54 (9): 22- 28. doi: 10.1109/MCOM.2016.7565183
2	王莹, 费子轩, 张向阳, 等. 移动边缘网络缓存技术. 北京邮电大学学报, 2017, 40 (6): 1- 13. doi: 10.13190/j.jbupt.2017-066
	WANG Y , FEI Z X , ZHANG X Y , et al. Survey on caching technology in mobile edge networks. Journal of Beijing University of Posts and Telecommunications, 2017, 40 (6): 1- 13. doi: 10.13190/j.jbupt.2017-066
3	JEDARI B , PREMSANKAR G , ILLAHI G , et al. Video caching, analytics, and delivery at the wireless edge: a survey and future directions. IEEE Communications Surveys & Tutorials, 2021, 23 (1): 431- 471. URL
4	YAO J J , HAN T , ANSARI N . On mobile edge caching. IEEE Communications Surveys & Tutorials, 2019, 21 (3): 2525- 2553. URL
5	ZHU H , CAO Y , WANG W , et al. Deep reinforcement learning for mobile edge caching: review, new features, and open issues. IEEE Network, 2018, 32 (6): 50- 57. doi: 10.1109/MNET.2018.1800109
6	LI T , BRAUD T , LI Y , et al. Lifecycle-aware online video caching. IEEE Transactions on Mobile Computing, 2021, 20 (8): 2624- 2636. doi: 10.1109/TMC.2020.2984364
7	江帆, 梁晓, 孙长印, 等. 雾无线接入网中基于内容流行度和信息新鲜度的缓存更新策略. 电子与信息学报, 2022, 44 (9): 3108- 3116. doi: 10.11999/JEIT220373
	JIANG F , LIANG X , SUN C Y , et al. Caching and update strategy based on content popularity and information freshness for fog radio access networks. Journal of Electronics & Information Technology, 2022, 44 (9): 3108- 3116. doi: 10.11999/JEIT220373
8	高子轩, 郑烇. NDMANET中基于内容优先级的缓存策略研究. 计算机工程, 2021, 47 (3): 190- 195. doi: 10.19678/j.issn.1000-3428.0057266
	GAO Z X , ZHENG Q . Research on content priority-based caching strategy in NDMANET. Computer Engineering, 2021, 47 (3): 190- 195. doi: 10.19678/j.issn.1000-3428.0057266
9	LIU Y, MAO Y L, SHANG X J, et al. Distributed cooperative caching in unreliable edge environments[C]//Proceedings of IEEE Conference on Computer Communications. Washington D. C., USA: IEEE Press, 2022: 1049-1058.
10	石小容, 李爱萍, 牛保宁, 等. 兼顾个性化需求的云边协作两级内容缓存研究. 计算机工程, 2023, 49 (5): 223- 230. doi: 10.19678/j.issn.1000-3428.0065809
	SHI X R , LI A P , NIU B N , et al. Research on two-level content caching for cloud-edge collaboration considering personalized requirements. Computer Engineering, 2023, 49 (5): 223- 230. doi: 10.19678/j.issn.1000-3428.0065809
11	张开元, 桂小林, 任德旺, 等. 移动边缘网络中计算迁移与内容缓存研究综述. 软件学报, 2019, 30 (8): 2491- 2516. doi: 10.13328/j.cnki.jos.005861
	ZHANG K Y , GUI X L , REN D W , et al. Survey on computation offloading and content caching in mobile edge networks. Journal of Software, 2019, 30 (8): 2491- 2516. doi: 10.13328/j.cnki.jos.005861
12	PANG H T, LIU J C, FAN X Y, et al. Toward smart and cooperative edge caching for 5G networks: a deep learning based approach[C]//Proceedings of the 26th IEEE/ACM International Symposium on Quality of Service. Washington D. C., USA: IEEE Press, 2018: 1-6.
13	CHEN B , LIU L , SUN M X , et al. IoTCache: toward data-driven network caching for Internet of things. IEEE Internet of Things Journal, 2019, 6 (6): 10064- 10076. URL
14	HAO H, XU C Q, WANG M, et al. Knowledge-centric proactive edge caching over mobile content distribution network[C]//Proceedings of IEEE Conference on Computer Communications. Washington D. C., USA: IEEE Press, 2018: 450-455.
15	ZHONG C, GURSOY M C, VELIPASALAR S. A deep reinforcement learning-based framework for content caching[C]//Proceedings of the 52nd Annual Conference on Information Sciences and Systems. Princeton, USA: IEEE Press, 2018: 1-6.
16	TIAN A , FENG B H , ZHOU H C , et al. Efficient federated DRL-based cooperative caching for mobile edge networks. IEEE Transactions on Network and Service Management, 2023, 20 (1): 246- 260. URL
17	JIANG W , FENG D Q , SUN Y , et al. Proactive content caching based on actor-critic reinforcement learning for mobile edge networks. IEEE Transactions on Cognitive Communications and Networking, 2022, 8 (2): 1239- 1252. URL
18	WANG X F , WANG C Y , LI X H , et al. Federated deep reinforcement learning for Internet of things with decentralized cooperative edge caching. IEEE Internet of Things Journal, 2020, 7 (10): 9441- 9455. URL
19	LIU Y , JIA J L , CAI J , et al. Deep reinforcement learning for reactive content caching with predicted content popularity in three-tier wireless networks. IEEE Transactions on Network and Service Management, 2023, 20 (1): 486- 501. URL
20	ZHONG C , GURSOY M C , VELIPASALAR S . Deep reinforcement learning-based edge caching in wireless networks. IEEE Transactions on Cognitive Communications and Networking, 2020, 6 (1): 48- 61. URL
21	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46 (7): 1301- 1312. doi: 10.16383/j.aas.c200159
	SUN C Y , MU C X . Important scientific problems of mult-agent deep reinforcement. Acta Automatica Sinica, 2020, 46 (7): 1301- 1312. doi: 10.16383/j.aas.c200159
22	FOERSTER J, NARDELLI N, FARQUHAR G, et al. Stabilising experience replay for deep multi-agent reinforcement learning[C]//Proceedings of International Conference on Machine Learning. Washington D. C., USA: IEEE Press, 2017: 1146-1155.
23	XU S Y , LIU X , GUO S Y , et al. MECC: a mobile edge collaborative caching framework empowered by deep reinforcement learning. IEEE Network, 2021, 35 (4): 176- 183. URL
24	CHEN S W , YAO Z , JIANG X F , et al. Multi-agent deep reinforcement learning-based cooperative edge caching for ultra-dense next-generation networks. IEEE Transactions on Communications, 2021, 69 (4): 2441- 2456. URL

[1]	石琼, 段辉, 师智斌. 基于深度强化学习的可信任务卸载方案[J]. 计算机工程, 2024, 50(8): 142-152.
[2]	孙文洁, 李宗民, 孙浩淼. 基于图神经网络的多智能体强化学习值函数分解方法[J]. 计算机工程, 2024, 50(5): 62-70.
[3]	傅明建, 郭福强. 基于深度强化学习的无信号灯路口决策研究[J]. 计算机工程, 2024, 50(5): 91-99.
[4]	张建强, 杨凯军, 欧阳凌丛. 具有规定性能的多智能体动态事件触发编队控制[J]. 计算机工程, 2024, 50(3): 78-88.
[5]	范晓宇, 贾新春, 李彬, 谢云飞. 多率采样机制下多智能体动态事件触发二分一致性研究[J]. 计算机工程, 2024, 50(3): 114-121.
[6]	杜海军, 余粟. 基于时空图注意力网络的服务机器人动态避障[J]. 计算机工程, 2024, 50(2): 105-112.
[7]	倪苏婕, 陈兵, 石优. 一种联合V2I和V2V的任务卸载优化方案[J]. 计算机工程, 2024, 50(12): 174-183.
[8]	宋艳蕊, 庄雷, 徐泽汐, 冯旭, 莫文帅. 基于云边协同的可靠服务功能链部署算法[J]. 计算机工程, 2024, 50(12): 184-193.
[9]	何杰, 马强. 基于深度强化学习的C-V2X任务卸载研究[J]. 计算机工程, 2024, 50(12): 200-212.
[10]	江敏, 陈飞, 程航, 王美清. 基于逐像素强化学习的边缘保持图像复原[J]. 计算机工程, 2024, 50(12): 224-232.
[11]	毕千, 钱程, 张可, 王成. 基于深度强化学习的多智能体角度跟踪方法设计[J]. 计算机工程, 2024, 50(11): 10-17.
[12]	王腾, 黄俊松, 王乐庭, 张才坤, 李枭扬. 基于MADDPG的多阵面相控阵雷达引导搜索资源优化算法[J]. 计算机工程, 2024, 50(11): 38-48.
[13]	程雯, 过榴晓. 基于积分滑模的二阶系统任意预设时间编队控制[J]. 计算机工程, 2024, 50(11): 163-172.
[14]	张俊娜, 韩超臣, 陈家伟, 赵晓焱, 袁培燕. 一种联合边缘服务器部署与服务放置的方法[J]. 计算机工程, 2024, 50(10): 266-280.
[15]	蔡梓越, 谭北海, 余荣, 黄旭民, 王思明. 面向6G物联网设备协同的区块链动态分片[J]. 计算机工程, 2024, 50(1): 50-59.

选择文件类型/文献管理软件名称

选择包含的内容