基于潜在空间HSIC正则化的图像解耦生成方法

doi:10.19678/j.issn.1000-3428.0252270

摘要/Abstract

摘要： 学习解耦表征以提升图像生成模型的可控性是计算机视觉领域的重要研究方向。然而，现有解耦表征学习方法存在两大局限性：一是依赖大规模标注数据，二是难以有效处理特征间的复杂依赖关系。为突破这些限制，本研究提出一种基于希尔伯特-施密特独立性准则（HSIC）的通用解耦生成方法。该方法创新性地将HSIC这一非参数统计方法转化为生成模型潜在空间的独立性正则化机制，通过施加HSIC正则项优化非线性依赖关系的度量目标，引导模型学习独立的特征表示。具体而言，本研究通过实验将HSIC准则融入两类主流生成模型架构的优化过程：在变分自编码器（VAE）模型类中，通过结合变分推断重构与HSIC正则项，优化潜在分布的解耦性；在扩散模型（DM）类中，通过将HSIC正则项嵌入反向过程的时间步优化，逐步实现渐进式特征解耦。实验结果表明，这种能够在不同模型架构中实现的通用方法提升了潜在表示的独立性，且在无监督场景下保持稳定性能，为建模特征间复杂依赖关系提供了新途径。为进一步验证解耦空间的语义一致性，本研究通过潜在空间插值实验，生成轨迹更加平滑的结果，证明了HSIC正则有效构建了线性可分的解耦空间。在评估体系方面，本研究采用标准解耦指标与基于HSIC的自定义指标进行双重验证，结果二者呈正相关，证实了解耦评价标准的客观性。

Abstract: Learning disentangled representations to enhance the controllability of image generation models is a key research direction in computer vision. However, existing methods face two major limitations: reliance on large-scale annotated data and difficulty in handling complex dependencies between features. To address these issues, this study proposes a universal generative disentangling method based on the Hilbert-Schmidt Independence Criterion (HSIC). This method innovatively converts HSIC into an independence regularization mechanism for the latent space of generative models. By incorporating HSIC regularization terms, it optimizes the measurement objective of nonlinear dependency relationships and guides the model to learn independent feature representations. Specifically, the study integrates HSIC into two mainstream generative model architectures: For the Variational Autoencoders (VAEs) class, it combines variational inference with HSIC regularization to optimize latent distribution disentanglement; For the Diffusion Models (DMs) class, it gradually achieves progressive feature disentangling by embedding the HSIC regularization term into the time step optimization of the reverse process. The experimental results show that this universal method, which can be implemented in different model architectures, enhances latent representation independence and maintains stable performance in unsupervised settings, offering a new way to model complex feature dependencies. To further verify the semantic consistency of the disentangling space, this study conducted latent space interpolation experiments to generate smoother trajectories, demonstrating that HSIC regularization constructs a linearly separable disentangling space. In terms of evaluation system, this study conducted dual validation using standard disentangling metrics and HSIC-based custom metrics, showing a positive correlation and confirming the objectivity of the disentangling evaluation criteria.

李元昊, 应方立. 基于潜在空间HSIC正则化的图像解耦生成方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252270.

Yuanhao Li, Fangli Ying. Image Disentangling Generation Method based on Latent Space HSIC Regularization[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252270.

参考文献

[1] 蔡江海, 黄成泉, 王顺霞, 杨贵燕, 罗森艳, 周丽华. 基于解耦表征学习的生成式视觉图像理解[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2024- 00003. Cai Jianghai, Huang Chengquan, Wang Shunxia, Yang Guiyan, Luo Sanyan, Zhou Lihua Generative Visual Image Understanding Based on Disentangled Representation Learning [J]. Journal of Computer Aided Design and Graphics DOI: 10.3724/SP.J.1089.2024-00003. [2] 文载道,王佳蕊,王小旭,等.解耦表征学习综述[J].自动化学报,2022,48(02):351-374. Wen Dao Dao, Wang Jiarui, Wang Xiaoxu, etc A Review of Disentangling Representation Learning [J]. Journal of Automation, 2022,48 (02): 351-374. [3] Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8): 1798-1828. [4] Wang X, Chen H, Wu Z, et al. Disentangled representation learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [5] Higgins I, Matthey L, Pal A, et al. beta-vae: Learning basic visual concepts with a constrained variational framework [C]//International conference on learning representations. 2017. [6] Chen R T Q, Li X, Grosse R B, et al. Isolating sources of disentanglement in variational autoencoders[J]. Advances in neural information processing systems, 2018, 31. [7] Chen X, Duan Y, Houthooft R, et al. Infogan: Interpretable representation learning by information maximizing generative adversarial nets[J]. Advances in neural information processing systems, 2016, 29. [8] Yang T, Wang Y, Lv Y, et al. Disdiff: Unsupervised disentanglement of diffusion probabilistic models[J]. arXiv preprint arXiv:2301.13721, 2023. [9] 曾文献,张曼钰,孙磊. 生成对抗式双重解耦的分阶段阴影去除算法[J]. 计算机工程与应用,2024. Zeng Wenwen, Zhang Manyu, Sun Lei Generative Adversarial Double Disentangling Staged Shadow Removal Algorithm [J]. Computer Engineering and Applications, 2024 . [10] 刘彦呈,董张伟,朱鹏莅,等.基于特征解耦的无监督水下图像增强[J].电子与信息学报,2022,44(10):3389-3398. Liu Yancheng, Dong Zhangwei, Zhu Pengli, etc Unsupervised underwater image enhancement based on feature Disentangling[J]. Chinese Journal of Electronics and Information Technology, 2022, 44 (10): 3389-3398 . [11] Burgess C P, Higgins I, Pal A, et al. Understanding disentangling in $\beta $-VAE[J]. arXiv preprint arXiv:1804.03599, 2018. [12] Van Den Oord A, Vinyals O. Neural discrete representation learning[J]. Advances in neural information processing systems, 2017, 30. [13] Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4401-4410. [14] Cao H, Tan C, Gao Z, et al. A survey on generative diffusion models[J]. IEEE Transactions on Knowledge and Data Engineering, 2024. [15] Gretton A, Bousquet O, Smola A, et al. Measuring statistical dependence with Hilbert-Schmidt norms[C]// International conference on algorithmic learning theory. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005: 63-77. [16] Dupont E. Learning disentangled joint continuous and discrete representations[J]. Advances in neural information processing systems, 2018, 31. [17] Kim Y, Kim D, Lee H M, et al. Unsupervised controllable generation with score-based diffusion models: Disentangled latent code guidance[C]//NeurIPS 2022 Workshop on Score-Based Methods. 2022. [18] Ma W D K, Lewis J P, Kleijn W B. The HSIC bottleneck: Deep learning without back-propagation[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(04): 5085-5092. [19] Liu X, Thermos S, Sanchez P, et al. HSIC-InfoGAN: learning unsupervised disentangled representations by maximising approximated mutual information[C]// MICCAI Workshop on Medical Applications with Disentanglements. Cham: Springer Nature Switzerland, 2022: 15-21. [20] Wang Z, Zhan Z, Gong Y, et al. DualHSIC: HSIC-bottleneck and alignment for continual learning[C]// International Conference on Machine Learning. PMLR, 2023: 36578-36592. [21] Carbonneau M A, Zaidi J, Boilard J, et al. Measuring disentanglement: A review of metrics[J]. IEEE transactions on neural networks and learning systems, 2022, 35(7): 8747-8761. [22] Kim H, Mnih A. Disentangling by factorising[C]// International conference on machine learning. PMLR, 2018: 2649-2658. [23] Eastwood C, Williams C K I. A framework for the quantitative evaluation of disentangled representations[C]// 6th International Conference on Learning Representations. 2018. [24] Yeats E, Liu F, Womble D, et al. Nashae: Disentangling representations through adversarial covariance minimization[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 36-51. [25] Preechakul K, Chatthee N, Wizadwongsa S, et al. Diffusion autoencoders: Toward a meaningful and decodable representation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10619-10629. [26] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [27] Liu Z, Luo P, Wang X, et al. Deep learning face attributes in the wild[C]//Proceedings of the IEEE international conference on computer vision. 2015: 3730-3738. [28] Guo J, Xu X, Pu Y, et al. Smooth diffusion: Crafting smooth latent spaces in diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 7548-7558.

选择文件类型/文献管理软件名称

选择包含的内容