Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

SPADE: Reinforcement Learning UAV Decision-Making Under Delayed Conditions

  

  • Online:2026-01-27 Published:2026-01-27

面向时延环境的无人机强化学习智能控制方法

Abstract: Recent advancements in Deep Reinforcement Learning (DRL) have shown strong capabilities in Unmanned Aerial Vehicle (UAV) decision-making, achieving significant results in various UAV control tasks. However, existing DRL studies on UAV control mostly assume an idealized “zero-delay” environment, overlooking the signal delays ubiquitous in real-world links. Directly transferring generic delay-handling methods validated on MuJoCo-like benchmarks to highly dynamic UAV platforms often fails to reproduce their original benefits and can even aggravate performance degradation. Through simulated experiments, this study verifies, in a simulated environment, the significant decline in performance experienced by DRL-trained UAV agents under signal delays, and provides an in-depth analysis of the causes. We propose the State-Prediction and Adaptive Decision-enhanced (SPADE), which effectively handles both fixed and non-fixed delays. In a 1v1 close-range combat task, SPADE achieves control performance comparable to that of a delay-free system. Within a delay range of 300–2400 milliseconds, SPADE demonstrates an average win rate improvement of 19.4% (for fixed delays) and 13.0% (for non-fixed delays) over baseline methods. It significantly compensates for the performance degradation of the Soft Actor-Critic (SAC) algorithm caused by delays and outperforms existing methods aimed at mitigating delay effects in deep reinforcement learning for UAV control. In general, this research highlights the negative impact of signal delay on the UAV control system and introduces SPADE as a robust solution to address these challenges, significantly improving UAV control performance under delay conditions.

摘要: 深度强化学习(DRL)在无人机(UAV)决策领域的最新研究中展现出强大潜力,已在多种控制任务中取得显著成效。然而,现有将DRL用于无人机控制的研究大多默认“零延迟”理想环境,忽略了真实链路中普遍存在的信号延迟;直接把在 MuJoCo 等基准任务中验证的通用延迟处理方法迁移到高动态无人机平台,往往难以复现其原有效果,甚至加剧性能衰减。本研究通过仿真实验验证了经过深度强化学习训练的无人机智能体在信号延迟下出现的显著性能下降,并深入分析了性能衰退的原因。论文提出了一种状态预测与自适应决策增强框架SPADE,能够有效处理固定与非固定延迟,在1v1近距离格斗任务中实现了与无延迟系统相媲美的控制效果:在300–2400 毫秒延迟区间内,SPADE较基线方法平均胜率增幅为19.4%(固定延迟)13.0%(非固定延迟),显著补偿了软演员-评论家(SAC)算法因延迟造成的性能衰减,且优于现有缓解深度强化学习无人机控制延迟效应的方法。总体而言,本研究揭示了信号延迟对无人机控制系统的负面影响,并提出了创新的状态预测与决策增强框架(SPADE)以应对此挑战,显著提升了在延迟条件下的无人机控制性能。