作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

稀疏双注意力驱动的图像超分辨率重建网络

  • 发布日期:2026-02-11

A Sparse Dual-Attention Driven Network for Image Super-Resolution

  • Published:2026-02-11

摘要: 在单图像超分辨率(SISR)任务中,Transformer虽能借助自注意力(SA)机制有效捕获全局依赖,但其存在复杂度高,信息冗余,参数量大等问题,限制了其在低功耗设备上的适用性。为了解决以上问题,提出一种轻量化特征聚合Transformer(FATNet)模型。该模型通过连续运用空间与通道自注意力,协同聚合双维度特征;同时引入稀疏化策略,该策略沿空间与通道维度自适应筛选关键信息以优化自注意力的计算效率;在计算注意力矩阵前利用深度卷积强化局部上下文建模,并采用通道分离与深度可分离卷积设计轻量化前馈网络(SFFN),在降低参数量的同时保留非线性表达。在5个常用数据集上的实验结果表明,相较于SMFANet,CATANet等具有代表性的轻量级SISR模型,FATNet更好地平衡了模型参数和重建性能。相较于MAN-light模型,FATNet在放大因子为×2和×3数据集上,参数量分别减少48%和47%,且重建效果更佳。对比当前最新的轻量化超分辨率模型(CATANet),在参数量减小的情况下,FATNet的峰值信噪比(PSNR)和结构相似性(SSIM)分别最大提升0.15dB和0.0029,具有更好的重建效果。

Abstract: In single image super-resolution (SISR) tasks, although Transformer can effectively capture global dependencies through self attention (SA) mechanism, it has problems such as high complexity, information redundancy, and large parameter quantity, which limit its applicability on low-power devices. To address the above issues, a lightweight feature aggregation Transformer (FATNet) model is proposed. This model synergistically aggregates dual-dimensional features through the continuous application of spatial and channel self-attention; At the same time, introduces sparsification strategies to adaptively screening critical information across spatial and channel dimensions, thereby optimising SA computational efficiency; Before calculating the attention matrix, deep convolution is used to enhance local context modeling, and channel separation and depthwise separable convolution are used to design a lightweight feedforward network (SFFN) that reduces the number of parameters while preserving nonlinear expressions. The experimental results on five commonly used datasets show that compared to representative lightweight SISR models such as SMFANet and CATANet, FATNet better balances model parameters and reconstruction performance. Compared to the MAN-light model, FATNet reduced the number of parameters by 48% and 47% on datasets with magnification factors of × 2 and × 3, respectively, and achieved better reconstruction results. Compared with the latest lightweight super-resolution model (CATANet), FATNet achieves a maximum improvement of 0.15dB in peak signal-to-noise ratio (PSNR) and 0.0029 in structural similarity (SSIM) with reduced parameter count, demonstrating better reconstruction performance.