作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于双鉴别器和时空自校准的零样本骨架动作识别

  • 发布日期:2025-11-13

Zero-Shot Skeleton Action Recognition via Dual Discriminators and Spatiotemporal Self-Calibration

  • Published:2025-11-13

摘要: 基于骨架的零样本动作识别任务借助的是文本标签描述信息和骨架动作信息来对可见类别与未见类别的动作进行区分。现有的方法通常受到视觉特征生成质量不高问题的限制,无法准确对齐语义造成在相似动作的识别上效果欠佳。为了解决这个问题,本文提出了基于双鉴别器和时空自校准的方法(DD-STSC)来探索视觉语义对齐。该方法通过变分自编码器和生成对抗网络的结合,利用鉴别器和生成器进行对抗训练,挖掘不同特征间的差异化信息,同时在解纠缠中更好的分离出有用信息与无用信息,以此进一步提升生成样本的质量。此外,还引入了动作自校准模块(ASCM),通过在时空方向对骨架信息进行学习可以更有效地获得需要的关键运动信息,从而提高分类任务的准确率。在公开数据集NTU60、NTU120、PKU51上进行了实验,结果表明所提出的方法优于现有主流的方法。

Abstract: Zero-shot skeleton-based action recognition uses text label descriptions and skeleton action sequences to distinguish visible and unseen categories of actions. Existing methods are usually limited by the problem of low generation quality in visual feature, so we cannot accurately align semantic, resulting in poor performance in identifying similar actions. To address this issue, this paper proposes a method based on dual discriminators and spatiotemporal self-calibration (DD-STSC) to explore visual semantic alignment. This method combines variational autoencoders and generative adversarial networks, using discriminators and generators for adversarial training to mine the differential information among different features. At the same time, it better separates useful information from useless information during disentanglement, thereby further improving the quality of generated samples. In addition, this paper introduces action self- calibration module(ASCM). By learning the skeleton information in the spatiotemporal direction, the required key motion information can be obtained more effectively, so as to improve the accuracy of classification tasks. Experiments on several widely available datasets NTU60, NTU120, and pku51 demonstrate that the proposed method outperforms the existing mainstream methods.