作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

短期动作预测深度学习方法综述

  • 出版日期:2025-08-21 发布日期:2025-08-21

Review of Deep Learning Methods for Short-Term Action Anticipation

  • Online:2025-08-21 Published:2025-08-21

摘要: 短期动作预测作为视频理解领域的重要任务,旨在通过建模历史动作的时空与语义特征,将观测到的物理动作转化为对动作意图和目标的推断,精准预测未来数秒内的交互行为,在人机协作、安防监控、自动驾驶、增强现实等领域具有广泛应用前景。近年,随着深度学习尤其是特征提取模型和高质量数据集在视频理解领域的突破,短期动作预测已经从知识驱动的机器学习范式转向数据驱动的深度学习范式。本综述系统回顾了该领域在深度学习方法中的最新技术,以期为相关研究及场景应用分析提供借鉴和参考。首先从模型架构创新、训练策略应用与上下文建模方法三个维度构建分类体系,分析领域内关键技术与挑战,并对每类方法的特点、适用场景及研究进展进行阐述。然后简要归纳任务中常用的数据集并梳理多种方法在主流数据集上的性能对比。最后提出了当前面临的挑战,从多视角协同预测、实时模型推理验证、弱监督未裁剪数据学习、小样本类增量泛化研究、动态开放场景自适应、可变时间间隔预测等未来可能的研究方向进行展望。

Abstract: Short-term action anticipation, a crucial task in video understanding, aims to model spatiotemporal and semantic features of historical actions to infer behavioral intentions and goals from observed physical motions. This technology enables precise prediction of interactive behaviors within the next few seconds. It demonstrates broad application prospects in human-machine collaboration, security surveillance, autonomous driving, and augmented reality. In recent years, with breakthroughs in deep learning,particularly in feature extraction models and high-quality datasets within the field of video understanding,short-term action anticipation has transitioned from knowledge-driven machine learning paradigms to data-driven deep learning frameworks. This survey systematically reviews the latest advancements in deep learning methods for short-term action anticipation, aiming to provide references and insights to related research and practical application . The analysis establishes a classification framework through three dimensions: model architecture innovation, training strategy implementation, and contextual modeling approaches. It examines core technologies and challenges, while detailing the characteristics, applicable scenarios, and research progress of each methodology category. Finally, potential future research directions were summarized and prospected, including multi-view collaborative prediction, real-time model inference validation, weakly-supervised learning from untrimmed data, few-shot class-incremental tiveization, dynamic open-scene adaptation, variable time interval anticipation.