作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于特征回归的多自监督特征融合方法

  • 发布日期:2025-05-09

Fusing Multiple Self-Supervised Representations by Solving a Feature Regression Task

  • Published:2025-05-09

摘要: 自监督学习在计算机视觉任务中表现出强大的潜力,但是如何有效的融合多个自监督任务提取的特征,仍是当前研究领域的一大热门挑战。传统多任务学习方法因输入冲突、架构不兼容等问题难以有效整合异构自监督特征,而现有特征融合方法(如子空间学习)往往过度压缩特征空间,导致任务特异性信息丢失。文中提出一种基于特征回归任务的多自监督特征融合方法,该方法将特征融合问题视为多视角学习问题,其目的是学习一个跨越不同视角的公共潜在空间,并最大化不同自监督特征之间的相关性。模型首先将多自监督特征视为互补的“多视角”表示,构建以Transformer编码器为核心的特征交互网络。然后,特征回归任务会以掩码特征为输入,通过自注意力机制挖掘跨任务相关性以重建原始特征,迫使模型在最大化共享信息的同时保留独有信息。得到的特征包含了图像不同视角的大量共享信息和独有信息,使得特征更具泛化性。在多个知名数据集上进行图像分类实验并进行对比得出,融合后的特征泛化能力明显好于融合前的特征,进而验证了特征融合方式的有效性。

Abstract: Self-supervised learning has demonstrated strong potential in computer vision tasks. However, how to effectively fuse features extracted from multiple self-supervised tasks remains a major challenge in the current research field. Traditional multi-task learning methods struggle to effectively integrate heterogeneous self-supervised features due to issues such as input conflicts and architectural incompatibilities. Existing feature fusion methods (e.g., subspace learning) often over-compress the feature space, leading to the loss of task-specific information. This paper proposes a multi-self-supervised feature fusion method based on a feature regression task, which treats the feature fusion problem as a multi-view learning task. The goal is to learn a shared latent space across different views and maximize the correlation between different self-supervised features. The model first treats the multi-self-supervised features as complementary "multi-view" representations and constructs a feature interaction network centered around a Transformer encoder. Then, the feature regression task uses masked features as input, and through a self-attention mechanism, it explores cross-task correlations to reconstruct the original features, forcing the model to preserve unique information while maximizing shared information. The resulting features contain a large amount of shared and unique information from different views of the image, making the features more generalized. Image classification experiments conducted on multiple well-known datasets show that the fused features exhibit significantly better generalization performance compared to the features before fusion, thus validating the effectiveness of the feature fusion method.