Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

FDR-Net:Multimodal Medical Image Registration via Explicit Feature Disentanglement and Reconstruction Constraints

  

  • Published:2026-06-11

FDR-Net:基于显式特征解耦与重建约束的多模态医学图像配准

Abstract: Multi-modal medical image registration aims to achieve accurate spatial alignment of anatomical structures across different imaging modalities. However, due to inherent differences in imaging mechanisms, significant inconsistencies exist in intensity distribution and texture characteristics among modalities, which lead existing methods to suffer from limited accuracy and robustness in complex scenarios. Recently, unsupervised feature disentanglement approaches have partially alleviated the reliance on registration labels. Nevertheless, the lack of explicit constraints often results in insufficient suppression of modality-specific information and potential degradation of key anatomical structures. Therefore, effectively eliminating modality discrepancies while preserving structural integrity remains a fundamental challenge in multi-modal medical image registration. To address this issue, this paper proposes a Feature Decoupling and Structural Reconstruction Network (FDR-Net), which establishes a closed-loop framework consisting of feature disentanglement, deformation estimation, and reconstruction verification. Specifically, a feature encoder with global self-attention is employed to explicitly decompose input images into modality-related style representations and modality-invariant structural representations. A modality discrimination constraint is further introduced to encourage effective removal of style information from structural features. Moreover, a cross-modal feature mixing strategy is designed to artificially introduce modality perturbations, thereby enhancing the robustness of structural representations against modality variations. In the registration stage, a U-Net-based architecture is adopted to predict dense deformation fields from the disentangled structural features. Feature-level and image-level similarity constraints are jointly imposed, together with a smoothness regularization term to ensure spatial continuity and physical plausibility of the deformation field.In addition, a cycle-consistent reconstruction module is incorporated, where reconstructed targets are dynamically generated based on predicted deformation fields. A composite reconstruction loss, consisting of structural similarity (SSIM) and mean squared error (MSE), is used to back-propagate supervision signals to the feature learning process. This design further strengthens structural consistency while suppressing modality discrepancies. Extensive experiments are conducted on two public datasets, SR-Reg and BraTS2021, to validate the effectiveness of the proposed method. On the SR-Reg dataset, the Dice score without registration is 62.24%, while the proposed FDR-Net achieves 79.58%, outperforming the second-best method BSF_Fusion (77.86%) by 1.72 percentage points. The HD95 and ASSD are 2.89 mm and 0.90 mm, respectively, and the deformation fields show smoother and more stable performance in critical anatomical regions such as ventricles. On the more challenging BraTS2021 dataset, which includes complex tumor-induced deformations, FDR-Net achieves a Dice score of 86.85%, outperforming BSF_Fusion (84.98%) by 1.87 percentage points, with HD95 and ASSD reduced to 4.12 mm and 1.79 mm, respectively. Notably, these improvements are achieved with only approximately 1.0M additional parameters. Ablation studies further demonstrate that removing the cross-modal mixing strategy, modality discrimination constraint, or cycle-consistent reconstruction module leads to Dice drops of 5.3, 4.8, and 6.1 percentage points, respectively. Feature analysis also confirms that the proposed method effectively reduces modality separability in structural representations, enabling stable modality-invariant feature learning. In conclusion, the proposed FDR-Net effectively disentangles modality-specific style information from anatomical structure representations through explicit feature decoupling, cross-modal mixing, multi-level discrimination constraints, and cycle-consistent reconstruction. It significantly improves registration accuracy and robustness while preserving structural integrity. Without relying on generative image translation or handcrafted similarity metrics, the proposed method provides an efficient and generalizable solution for multi-modal medical image registration in complex clinical scenarios.

摘要: 多模态医学图像配准旨在实现不同成像模态间解剖结构的精确空间对齐,但由于成像机理差异,不同模态在灰度分布与纹理特征上存在显著不一致性,使得现有方法在复杂场景下仍面临配准精度与鲁棒性不足的问题。近年来,无监督特征解耦方法虽在一定程度上缓解了对配准标签的依赖,但由于缺乏显式约束,易导致模态信息抑制不充分及关键解剖结构信息损失。因此,如何在有效消除模态差异的同时保持结构信息的完整性,仍是多模态医学图像配准中的关键挑战。 针对上述问题,本文提出一种基于显式特征解耦与结构重建约束的多模态医学图像配准方法(Feature Decoupling and Structural Reconstruction Network,FDR-Net),构建了涵盖特征解耦、形变估计与重建验证的闭环学习框架。首先,通过引入全局自注意力机制的特征编码器,将输入图像显式分解为模态风格与解剖结构信息,并结合模态判别约束促进结构特征中风格信息的有效剥离。进一步地,设计跨模态特征混合机制,通过人为构造模态干扰,增强模型对模态变化的鲁棒性,从而学习更加稳定的结构表示。在配准阶段,以解耦后的结构特征为输入,利用U-Net预测密集形变场,并通过特征级与图像级相似性约束实现结构对齐,同时结合平滑正则化以保证形变的连续性与物理合理性。此外,引入循环一致性重建模块,该模块基于预测形变场动态生成重建目标,并通过由结构相似性(SSIM)与均方误差(MSE)构成的复合重建损失反向约束特征学习过程,从而在抑制模态差异的同时进一步强化关键结构信息的保持能力。 为验证所提方法的有效性,本文在 SR-Reg 与 BraTS2021 两个公开数据集上进行了系统评估。在 SR-Reg 数据集上,未配准时 Dice 为 62.24%,FDR-Net 达到 79.58%,较次优方法 BSF_Fusion(77.86%)提升 1.72 个百分点,HD95 为 2.89 mm,ASSD 为 0.90 mm,且在脑室等关键结构区域表现出更平滑稳定的形变场。在更具挑战的 BraTS2021 数据集上,FDR-Net 依然取得最佳性能,Dice 达 86.85%,较 BSF_Fusion(84.98%)提升 1.87 个百分点,HD95 与 ASSD 分别降至 4.12 mm 与 1.79 mm,表明其在肿瘤病灶引起的复杂形变条件下仍具有优异鲁棒性。消融实验进一步表明,移除跨模态混合机制、模态判别约束或循环一致性重建模块后,Dice 分别下降 5.3、4.8 和 6.1 个百分点,特征分析结果亦验证了模型能够有效降低结构特征的模态可分性,实现稳定的模态不变表示学习。 综上所述,本文提出的 FDR-Net 通过显式特征解耦、跨模态特征混合、多重判别约束及循环一致性重建机制,实现了模态风格信息与解剖结构信息的有效分离,在保证结构完整性的前提下显著提升了多模态医学图像配准的精度与鲁棒性。该方法无需依赖生成式图像转换或手工设计相似性度量,为复杂临床场景下的多模态医学图像配准提供了一种高效且具有良好泛化能力的解决方案。