FDR-Net：Multimodal Medical Image Registration via Explicit Feature Disentanglement and Reconstruction Constraints

doi:10.19678/j.issn.1000-3428.0260192

Abstract

Abstract: Multi-modal medical image registration aims to achieve accurate spatial alignment of anatomical structures across different imaging modalities. However, due to inherent differences in imaging mechanisms, significant inconsistencies exist in intensity distribution and texture characteristics among modalities, which lead existing methods to suffer from limited accuracy and robustness in complex scenarios. Recently, unsupervised feature disentanglement approaches have partially alleviated the reliance on registration labels. Nevertheless, the lack of explicit constraints often results in insufficient suppression of modality-specific information and potential degradation of key anatomical structures. Therefore, effectively eliminating modality discrepancies while preserving structural integrity remains a fundamental challenge in multi-modal medical image registration. To address this issue, this paper proposes a Feature Decoupling and Structural Reconstruction Network (FDR-Net), which establishes a closed-loop framework consisting of feature disentanglement, deformation estimation, and reconstruction verification. Specifically, a feature encoder with global self-attention is employed to explicitly decompose input images into modality-related style representations and modality-invariant structural representations. A modality discrimination constraint is further introduced to encourage effective removal of style information from structural features. Moreover, a cross-modal feature mixing strategy is designed to artificially introduce modality perturbations, thereby enhancing the robustness of structural representations against modality variations. In the registration stage, a U-Net-based architecture is adopted to predict dense deformation fields from the disentangled structural features. Feature-level and image-level similarity constraints are jointly imposed, together with a smoothness regularization term to ensure spatial continuity and physical plausibility of the deformation field.In addition, a cycle-consistent reconstruction module is incorporated, where reconstructed targets are dynamically generated based on predicted deformation fields. A composite reconstruction loss, consisting of structural similarity (SSIM) and mean squared error (MSE), is used to back-propagate supervision signals to the feature learning process. This design further strengthens structural consistency while suppressing modality discrepancies. Extensive experiments are conducted on two public datasets, SR-Reg and BraTS2021, to validate the effectiveness of the proposed method. On the SR-Reg dataset, the Dice score without registration is 62.24%, while the proposed FDR-Net achieves 79.58%, outperforming the second-best method BSF_Fusion (77.86%) by 1.72 percentage points. The HD95 and ASSD are 2.89 mm and 0.90 mm, respectively, and the deformation fields show smoother and more stable performance in critical anatomical regions such as ventricles. On the more challenging BraTS2021 dataset, which includes complex tumor-induced deformations, FDR-Net achieves a Dice score of 86.85%, outperforming BSF_Fusion (84.98%) by 1.87 percentage points, with HD95 and ASSD reduced to 4.12 mm and 1.79 mm, respectively. Notably, these improvements are achieved with only approximately 1.0M additional parameters. Ablation studies further demonstrate that removing the cross-modal mixing strategy, modality discrimination constraint, or cycle-consistent reconstruction module leads to Dice drops of 5.3, 4.8, and 6.1 percentage points, respectively. Feature analysis also confirms that the proposed method effectively reduces modality separability in structural representations, enabling stable modality-invariant feature learning. In conclusion, the proposed FDR-Net effectively disentangles modality-specific style information from anatomical structure representations through explicit feature decoupling, cross-modal mixing, multi-level discrimination constraints, and cycle-consistent reconstruction. It significantly improves registration accuracy and robustness while preserving structural integrity. Without relying on generative image translation or handcrafted similarity metrics, the proposed method provides an efficient and generalizable solution for multi-modal medical image registration in complex clinical scenarios.

摘要： 多模态医学图像配准旨在实现不同成像模态间解剖结构的精确空间对齐，但由于成像机理差异，不同模态在灰度分布与纹理特征上存在显著不一致性，使得现有方法在复杂场景下仍面临配准精度与鲁棒性不足的问题。近年来，无监督特征解耦方法虽在一定程度上缓解了对配准标签的依赖，但由于缺乏显式约束，易导致模态信息抑制不充分及关键解剖结构信息损失。因此，如何在有效消除模态差异的同时保持结构信息的完整性，仍是多模态医学图像配准中的关键挑战。针对上述问题，本文提出一种基于显式特征解耦与结构重建约束的多模态医学图像配准方法（Feature Decoupling and Structural Reconstruction Network，FDR-Net），构建了涵盖特征解耦、形变估计与重建验证的闭环学习框架。首先，通过引入全局自注意力机制的特征编码器，将输入图像显式分解为模态风格与解剖结构信息，并结合模态判别约束促进结构特征中风格信息的有效剥离。进一步地，设计跨模态特征混合机制，通过人为构造模态干扰，增强模型对模态变化的鲁棒性，从而学习更加稳定的结构表示。在配准阶段，以解耦后的结构特征为输入，利用U-Net预测密集形变场，并通过特征级与图像级相似性约束实现结构对齐，同时结合平滑正则化以保证形变的连续性与物理合理性。此外，引入循环一致性重建模块，该模块基于预测形变场动态生成重建目标，并通过由结构相似性（SSIM）与均方误差（MSE）构成的复合重建损失反向约束特征学习过程，从而在抑制模态差异的同时进一步强化关键结构信息的保持能力。为验证所提方法的有效性，本文在 SR-Reg 与 BraTS2021 两个公开数据集上进行了系统评估。在 SR-Reg 数据集上，未配准时 Dice 为 62.24%，FDR-Net 达到 79.58%，较次优方法 BSF_Fusion（77.86%）提升 1.72 个百分点，HD95 为 2.89 mm，ASSD 为 0.90 mm，且在脑室等关键结构区域表现出更平滑稳定的形变场。在更具挑战的 BraTS2021 数据集上，FDR-Net 依然取得最佳性能，Dice 达 86.85%，较 BSF_Fusion（84.98%）提升 1.87 个百分点，HD95 与 ASSD 分别降至 4.12 mm 与 1.79 mm，表明其在肿瘤病灶引起的复杂形变条件下仍具有优异鲁棒性。消融实验进一步表明，移除跨模态混合机制、模态判别约束或循环一致性重建模块后，Dice 分别下降 5.3、4.8 和 6.1 个百分点，特征分析结果亦验证了模型能够有效降低结构特征的模态可分性，实现稳定的模态不变表示学习。综上所述，本文提出的 FDR-Net 通过显式特征解耦、跨模态特征混合、多重判别约束及循环一致性重建机制，实现了模态风格信息与解剖结构信息的有效分离，在保证结构完整性的前提下显著提升了多模态医学图像配准的精度与鲁棒性。该方法无需依赖生成式图像转换或手工设计相似性度量，为复杂临床场景下的多模态医学图像配准提供了一种高效且具有良好泛化能力的解决方案。

XU Jing-wen , TANG Kun , YANG Meng-long , WANG Li-hui. FDR-Net：Multimodal Medical Image Registration via Explicit Feature Disentanglement and Reconstruction Constraints[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260192.

徐静雯, 唐堃, 杨梦龙, 王丽会. FDR-Net：基于显式特征解耦与重建约束的多模态医学图像配准[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260192.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260192

References

[1] Maes F, Collignon A, Vandermeulen D, et al. Multimodality image registration by maximization of mutual information[J]. IEEE transactions on Medical Imaging, 2002,16(2):187-198.
[2] Rueckert D, Sonoda L I, Hayes C, et al. Nonrigid registration using free-form deformations: application to breast MR images[J]. IEEE transactions on medical imaging, 2002,18(8):712-721.
[3] Klein S, Staring M, Murphy K, et al. Elastix: a toolbox for intensity-based medical image registration[J]. IEEE transactions on medical imaging, 2009,29(1):196-205.
[4] Fu Y, Brown N M, Saeed S U, et al. DeepReg: a deep learning toolkit for medical image registration[J]. arXiv preprint arXiv:2011.02580, 2020.
[5] 张桂梅, 胡强, 龚磊. 融合密集残差块和 GAN 变体的医学图像非刚性配准[J]. 中国图象图形学报, 2020,25(10):2182-2194. Guimei Zhang, Qiang Hu, Lei Gong. Non-rigid medical image registration based on residual-in-residual dense block and GAN[J]. Journal of Image and Graphics, 2020, 25(10): 2182-2194.
[6] Blendowski M, Bouteldja N, Heinrich M P. Multimodal 3D medical image registration guided by shape encoder–decoder networks[J]. International journal of computer assisted radiology and surgery, 2020,15(2):269-276.
[7] Li Z, Yu F, Lu J, et al. Gmm-coregnet: A multimodal groupwise registration framework based on gaussian mixture model: International Conference on Medical Image Computing and Computer-Assisted Intervention[C], 2024. Springer.
[8] 洪犇, 钱旭升, 申明磊, 等. 基于深度学习的CT-MR图像联合配准分割方法[J]. 计算机工程, 2023,49(09):234-245. Ben HONG, Xusheng QIAN, Minglei SHEN, Jisu HU, Chen GENG, Yakang DAI, Zhiyong ZHOU. Joint Registration and Segmentation Method of CT-MR Images Based on Deep Learning[J]. Computer Engineering, 2023, 49(9): 234-245.
[9] Wang P, Guo Y, Wang Y. Few-shot multi-modal registration with mono-modal knowledge transfer[J]. Biomedical Signal Processing and Control, 2023,85:104958.
[10] Hoffmann M, Billot B, Greve D N, et al. SynthMorph: learning contrast-invariant registration without acquired images[J]. IEEE transactions on medical imaging, 2021,41(3):543-558.
[11] Kong L, Qi X S, Shen Q, et al. Indescribable multi-modal spatial evaluator: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C], 2023.
[12] Heinrich M P, Jenkinson M, Bhushan M, et al. MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration[J]. Medical image analysis, 2012,16(7):1423-1435.
[13] He Y, Yang G, Ge R, et al. Geometric visual similarity learning in 3d medical image self-supervised pre-training: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C], 2023.
[14] Ronchetti M, Wein W, Navab N, et al. Disa: Differentiable similarity approximation for universal multimodal registration: International Conference on Medical Image Computing and Computer-Assisted Intervention[C], 2023. Springer.
[15] 贾志有, 王国刚. 结合多尺度特征与局部采样描述的多模态图像配准方法[J]. 计算机应用研究, 2025,42(06):1887-1893. Jia Zhiyou, Wang Guogang. Research on multimodal image registration method combining multi-scale features and local sampling description [J]. Application Research of Computers, 2025, 42 (6): 1887-1893.
[16] Arar M, Ginger Y, Danon D, et al. Unsupervised multi-modal image registration via geometry preserving image-to-image translation: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition[C], 2020.
[17] Wei D, Ahmad S, Huo J, et al. Synthesis and inpainting-based MR-CT registration for image-guided thermal ablation of liver tumors: International conference on medical image computing and computer-assisted intervention[C], 2019. Springer.
[18] Yang A, Yang T, Zhao X, et al. DTR-GAN: an unsupervised bidirectional translation generative adversarial network for MRI-CT registration[J]. Applied Sciences, 2023,14(1):95.
[19] Zheng Y, Sui X, Jiang Y, et al. SymReg-GAN: symmetric image registration with generative adversarial networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2021,44(9):5631-5646.
[20] Lian C, Li X, Kong L, et al. CoCycleReg: collaborative cycle-consistency method for multi-modal medical image registration[J]. Neurocomputing, 2022,500:799-808.
[21] Pielawski N, Wetzer E, Öfverstedt J, et al. CoMIR: Contrastive multimodal image representation for registration[J]. Advances in neural information processing systems, 2020,33:18433-18444.
[22] Deng X, Liu E, Li S, et al. Interpretable multi-modal image registration network based on disentangled convolutional sparse coding[J]. IEEE Transactions on Image Processing, 2023,32:1078-1091.
[23] Wen K, Xie B, Duan B, et al. MambaReg: Mamba-based disentangled convolutional sparse coding for unsupervised deformable multi-modal image registration[J]. arXiv preprint arXiv:2411.01399, 2024.
[24] 李文举, 孔德卿, 曹国刚, 等. 基于注意力残差网络的跨模态医学图像配准[J]. 计算机仿真, 2022,39(11):224-229. Li Wenju, Kong Deqing, Cao Guogang, et al. Cross-modal medical image registration based on attention residual network[J]. Computer Simulation, 2022, 39(11): 224-229.
[25] Chen Z, Zheng Y, Gee J C. TransMatch: a transformer-based multilevel dual-stream feature matching network for unsupervised deformable image registration[J]. IEEE transactions on medical imaging, 2023,43(1):15-27.
[26] Zhu J, Zheng B, Xiong B, et al. SynMSE: A multimodal similarity evaluator for complex distribution discrepancy in unsupervised deformable multimodal medical image registration[J]. Medical Image Analysis, 2025,103:103620.
[27] Li H, Liu Z, Lyu Y, et al. Multimodal image registration for GPS-denied UAV navigation based on disentangled representations: 2023 IEEE International Conference on Robotics and Automation (ICRA)[C], 2023. IEEE.
[28] Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions: Proceedings of the IEEE/CVF international conference on computer vision[C], 2021.
[29] Li H, Su D, Cai Q, et al. BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2025,39(5):4725-4733.
[30] Lappe A, Giese M A. Register and [CLS] tokens induce a decoupling of local and global features in large ViTs: The Thirty-ninth Annual Conference on Neural Information Processing Systems[C], 2025.
[31] Guo T, Wang Y, Meng C. Mambamorph: a mamba-based backbone with contrastive feature learning for deformable mr-ct registration[J]. arXiv preprint arXiv:2401.13934, 2024,2.
[32] Billot B, Greve D N, Puonti O, et al. SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining[J]. Medical image analysis, 2023,86:102789.
[33] Hoopes A, Mora J S, Dalca A V, et al. SynthStrip: skull-stripping for any brain image[J]. NeuroImage, 2022,260:119474.
[34] Baid U, Ghodasara S, Mohan S, et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification[J]. arXiv preprint arXiv:2107.02314, 2021

Please choose a citation manager

Content to export