An Online Reinforcement Learning Task Scheduling Algorithm Based on Policy Entropy Supervision

doi:10.19678/j.issn.1000-3428.0253414

Abstract

Abstract: In cloud computing environments, workloads and resource states change continuously over time, which often causes reinforcement-learning-based scheduling policies to suffer from unstable randomness during online execution, leading to increased energy consumption or degraded response time. Conventional Soft Actor–Critic (SAC) mainly relies on temperature tuning during training to control policy randomness, and thus struggles to adapt promptly to non-stationary workloads in real systems. To address this issue, this paper proposes an entropy-supervised Soft Actor–Critic algorithm for online cloud task scheduling, referred to as ESAC. Without altering the original training structure, ESAC introduces a policy entropy supervision mechanism during inference to monitor policy randomness in real time and triggers lightweight entropy feedback fine-tuning when the entropy deviates from a stable range, enabling fast correction with constant computational cost. In addition, sliding-window reward normalization and periodic incremental updates are employed to alleviate numerical instability caused by reward scale drift under dynamic workloads. Experiments based on dynamic workload simulations constructed from the Alibaba Cluster Trace 2018 demonstrate that ESAC consistently outperforms several representative scheduling algorithms under different load intensities and burst scenarios, reducing the average energy consumption per task by about 1.8% and the average response time by up to 3.01%. Compared with the A2C baseline, ESAC achieves improvements of 70.7%, 76.0%, and 76.2% in the composite performance metric under three load scenarios, while maintaining acceptable online scheduling overhead. These results verify the effectiveness of the proposed method in enhancing the stability and adaptability of online scheduling in non-stationary cloud environments.

摘要： 云计算环境中负载与资源状态随时间持续变化，易导致基于强化学习的调度策略在推理阶段出现随机性失稳，从而引发能耗上升或响应时间恶化。传统软演员–评论家算法（SAC）主要依赖训练阶段的温度调节机制控制策略随机性，在非平稳负载条件下难以及时适应实时系统变化。针对该问题，本文提出一种面向在线云任务调度的熵监督软演员–评论家算法（ESAC）。在保持原有算法训练结构不变的前提下，ESAC在推理阶段引入策略熵监督机制，实时监测策略随机性状态，并在熵值偏离稳定区间时触发轻量级熵反馈微调，以常数级计算代价实现对策略随机性的快速修正。同时，结合滑动窗口奖励标准化与周期性增量更新，缓解动态负载下奖励尺度漂移带来的数值不稳定问题。基于Alibaba Cluster Trace v2018构建的动态负载仿真实验结果表明，ESAC在不同负载强度与突发场景下均优于多种代表性调度算法，单位任务平均能耗降低约1.8%，平均响应时间最大降低3.01%，相较于A2C，其在三种负载场景下的综合性能指标分别提升70.7%、76.0%和76.2%，且在线调度开销保持在可接受范围内。实验结果验证了所提方法在非平稳云环境中提升在线调度稳定性与适应性的有效性。

ZHANG Yuzhang, TIAN Le, WEI Huali, LIN Yumao, LV Shibin, GUO Maozu. An Online Reinforcement Learning Task Scheduling Algorithm Based on Policy Entropy Supervision[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253414.

张玉樟, 田乐, 魏华利, 林雨茂, 吕世宾, 郭茂祖. 基于策略熵监督的在线强化学习任务调度算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253414.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0253414

References

[1] Hamzaoui I, Duthil B, Courboulay V, et al. A survey on the current challenges of energy-efficient cloud resources management[J].SN Computer Science,2020,1(7):1263-1283.
[2] 田倬璟,黄震春,张益农.云计算环境任务调度方法研究综述[J].计算机工程与应用,2021,57(02):1-11. Tian Z J, Huang Z C, Zhang Y N. A review of task scheduling methods in cloud computing environments [J]. Computer Engineering and Applications, 2021, 57(02): 1-11.（in Chinese）
[3] Yang Y, Shen H. Deep reinforcement learning enhanced greedy optimization for online scheduling of batched tasks in cloud HPC systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 33(11): 3003-3014.
[4] Arunarani A ,Manjula D ,Sugumaran V . Task scheduling techniques in cloud computing: a literature survey[J].Future Generation Computer Systems,2019,91407-415.
[5] Soltani N, Soleimani B, Barekatain B. Heuristic algorithms for task scheduling in cloud computing: a survey[J]. International Journal of Computer Network and Information Security,2017,9(8):16-22.
[6] Houssein E H, Gad A G, Wazery Y M, et al. Task scheduling in cloud computing based on meta-heuristics: review, taxonomy, open challenges, and future trends[J]. Swarm and Evolutionary Computation, 2021, 62: 100841.
[7] 汪婷，邵鹏，李光泉，等.改进的粒子群优化算法在云计算任务调度中的应用[J].科学技术与工程, 2023, 23(29):12594-12603. Wang T, Shao P, Li G G, et al. Improved particle swarm optimization algorithm for cloud computing task scheduling[J]. Science Technology and Engineering, 2023, 23(29): 12594-12603.（in Chinese）
[8] 王宏杰，徐胜超.基于改进遗传算法的云计算任务调度方法[J].计算机技术与发展,2024,34(02):40-45. Wang H J, Xu S C. Cloud computing task scheduling method based on improved genetic algorithm [J]. Computer Technology and Development, 2024, 34(02): 40-45.（in Chinese）
[9] Paulraj D, Sethukarasi T, Neelakandan S, et al. An efficient hybrid job scheduling optimization (EHJSO) approach to enhance resource search using Cuckoo and grey wolf job optimization for cloud environment[J]. PloS one, 2023, 18(3): e0282600.
[10] Hosseini S M, Kanaan S K. A survey on meta-heuristic-based workflow scheduling algorithms running in the cloud computing platforms[J]. Service Oriented Computing and Applications, 2025,(prepublish): 1-21.
[11] Pan J, Wei Y, Meng L, et al. A dual scheduling framework for task and resource allocation in clouds using deep reinforcement learning[J]. Journal of King Saud University Computer and Information Sciences, 2025, 37(5): 81-81.
[12] 王立红，张延华，孟德彬，等.基于DDPG算法的云数据中心任务节能调度研究[J].高技术通讯,2023,33(09):927-936. Wang L H, Zhang Y H, Meng D B et al. Research on task energy-saving scheduling of cloud data center based on DDPG algorithm [J]. High Technology Communications, 2023, 33(09): 927-936.（in Chinese）
[13] Mauro F, Gianluca R. Application of proximal policy optimization for resource orchestration in serverless edge computing[J]. Computers,2024,13(9): 224-224.
[14] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. Pmlr, 2018: 1861-1870.
[15] Xiao Y, Yao Y, Zhu F. Parallel simulation multi-sample task scheduling approach based on deep reinforcement learning in cloud computing environment[J]. Mathematics, 2025, 13(14): 2249.
[16] Hou H, Ismail A. EETS: an energy-efficient task scheduler in cloud computing based on improved DQN algorithm[J]. Journal of King Saud University-Computer and Information Sciences, 2024, 36(8): 102177.
[17] Beloglazov A, Buyya R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers[J]. Concurrency and Computation: Practice and Experience, 2012, 24(13): 1397-1420.
[18] Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897.
[19] Kumar A, Fu J, Tucker G, et al. Stabilizing off-policy Q-Learning via bootstrapping error reduction[J]. CoRR,2019,abs/1906.00949
[20] Neu G. Explore no more: improved high-probability regret bounds for non-stochastic bandits[C]//Proceedings of the 29th International Conference on Neural Information Processing Systems-Volume 2. 2015: 3168-3176.
[21] Dudík M, Erhan D , Langford J, et al. Doubly Robust Policy Evaluation and Optimization[J].Statistical Science,2014,29(4):485-511.
[22] Cisse M, Bojanowski P, Grave E, et al. Parseval networks: Improving robustness to adversarial examples[C]//International conference on machine learning. PMLR, 2017: 854-863.
[23] Li F, Hu B. Deepjs: job scheduling based on deep reinforcement learning in cloud data center[C]//Proceedings of the 4th International Conference on Big Data and Computing. 2019: 48-53.
[24] Peng Z, Cui D, Zuo J, et al. Random task scheduling scheme based on reinforcement learning in cloud computing[J]. Cluster computing, 2015, 18(4): 1595-1607.
[25] Lu J, Yang J, Li S, et al. A2C-DRL: dynamic scheduling for stochastic edge–cloud environments using A2C and deep reinforcement learning[J]. IEEE Internet of Things Journal, 2024, 11(9): 16915-16927.
[26] Zhang X, Li S, Tang J, et al. DRL-Enabled computation offloading for AIGC services in IoIT-assisted edge computing networks[J]. IEEE Internet of Things Journal, 2024,: 1.

Please choose a citation manager

Content to export