KANG Panpan, CAO Yuecheng, TENG Liping, CHEN Junjie, LI Hongjun
Accepted: 2026-05-19
In recent years, although self-supervised skeleton-based action recognition has made progress, it still faces two types of training bias under strong augmentations: imbalance in local perturbation allocation can easily lead to over-perturbation of critical motion segments and insufficient variation in low-dynamic regions; in multi-positive contrastive learning, non-target positive samples participate in normalization competition, which can easily cause target conflicts and weaken representation aggregation. To address this issue, this paper proposes DCD-CLR, a self-supervised contrastive learning framework for collaborative optimization of view construction and objective construction, namely Dual-end Collaborative Debiasing Contrastive Representation Learning, to improve the quality of skeleton representation learning from the two aspects of augmentation allocation and contrastive objective. On the view side, Continuous Dynamic Saliency Augmentation (CDSA) is designed to integrate frame-difference energy and data-level joint motion priors, construct a frame-joint dynamic intensity map, and perform continuous, region-level, and sample-adaptive scheduling of spatiotemporal perturbation magnitudes, thereby improving view diversity while preserving critical motion segments. On the objective side, Target-Isolated InfoNCE (TI-InfoNCE) is proposed as a target-isolated debiased multi-positive contrastive objective, which removes the remaining positive samples when computing the normalization term of the target positive sample, so as to reduce competition interference among positive samples and improve the boundary clarity of the representation distribution. Under the linear evaluation setting, the proposed method achieves recognition accuracies of 85.9%, 79.6%, and 92.6% on NTU60 xsub, NTU120 xset, and PKU-MMD I, respectively; combined with the results of representation distribution visualization, transfer evaluation, and noise interference experiments, it is shown that the proposed method has good stability, generalization ability, and robustness.