作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于潜在空间HSIC正则化的图像解耦生成方法

  • 发布日期:2025-07-01

Image Disentangling Generation Method based on Latent Space HSIC Regularization

  • Published:2025-07-01

摘要: 学习解耦表征以提升图像生成模型的可控性是计算机视觉领域的重要研究方向。然而,现有解耦表征学习方法存在两大局限性:一是依赖大规模标注数据,二是难以有效处理特征间的复杂依赖关系。为突破这些限制,本研究提出一种基于希尔伯特-施密特独立性准则(HSIC)的通用解耦生成方法。该方法创新性地将HSIC这一非参数统计方法转化为生成模型潜在空间的独立性正则化机制,通过施加HSIC正则项优化非线性依赖关系的度量目标,引导模型学习独立的特征表示。具体而言,本研究通过实验将HSIC准则融入两类主流生成模型架构的优化过程:在变分自编码器(VAE)模型类中,通过结合变分推断重构与HSIC正则项,优化潜在分布的解耦性;在扩散模型(DM)类中,通过将HSIC正则项嵌入反向过程的时间步优化,逐步实现渐进式特征解耦。实验结果表明,这种能够在不同模型架构中实现的通用方法提升了潜在表示的独立性,且在无监督场景下保持稳定性能,为建模特征间复杂依赖关系提供了新途径。为进一步验证解耦空间的语义一致性,本研究通过潜在空间插值实验,生成轨迹更加平滑的结果,证明了HSIC正则有效构建了线性可分的解耦空间。在评估体系方面,本研究采用标准解耦指标与基于HSIC的自定义指标进行双重验证,结果二者呈正相关,证实了解耦评价标准的客观性。

Abstract: Learning disentangled representations to enhance the controllability of image generation models is a key research direction in computer vision. However, existing methods face two major limitations: reliance on large-scale annotated data and difficulty in handling complex dependencies between features. To address these issues, this study proposes a universal generative disentangling method based on the Hilbert-Schmidt Independence Criterion (HSIC). This method innovatively converts HSIC into an independence regularization mechanism for the latent space of generative models. By incorporating HSIC regularization terms, it optimizes the measurement objective of nonlinear dependency relationships and guides the model to learn independent feature representations. Specifically, the study integrates HSIC into two mainstream generative model architectures: For the Variational Autoencoders (VAEs) class, it combines variational inference with HSIC regularization to optimize latent distribution disentanglement; For the Diffusion Models (DMs) class, it gradually achieves progressive feature disentangling by embedding the HSIC regularization term into the time step optimization of the reverse process. The experimental results show that this universal method, which can be implemented in different model architectures, enhances latent representation independence and maintains stable performance in unsupervised settings, offering a new way to model complex feature dependencies. To further verify the semantic consistency of the disentangling space, this study conducted latent space interpolation experiments to generate smoother trajectories, demonstrating that HSIC regularization constructs a linearly separable disentangling space. In terms of evaluation system, this study conducted dual validation using standard disentangling metrics and HSIC-based custom metrics, showing a positive correlation and confirming the objectivity of the disentangling evaluation criteria.