Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (2): 186-196. doi: 10.19678/j.issn.1000-3428.0070426

• Computer Vision and Image Processing • Previous Articles    

Superpixel Guide for Transformer Low-Light Image Denoising Method

SONG Quanzhen, CHEN Zuojun, QIN Pinle, ZENG Jianchao   

  1. School of Computer Science and Technology, North University of China, Taiyuan 030051, Shanxi, China
  • Received:2024-09-30 Revised:2024-11-27 Published:2025-01-17

基于超像素引导的Transformer低光图像去噪方法

宋泉臻, 陈作钧, 秦品乐, 曾建潮   

  1. 中北大学计算机科学与技术学院, 山西 太原 030051
  • 作者简介:宋泉臻(CCF学生会员),男,硕士,主研方向为底层计算机视觉、低光图像去噪;陈作钧,博士;秦品乐(通信作者,E-mail:qpl@nuc.edu.cn)、曾建潮,教授、博士。
  • 基金资助:
    山西省科技重大专项计划"揭榜挂帅"项目(202101010101018);山西省长治市"揭榜挂帅"项目。

Abstract: Existing low-light image denoising methods mainly use the feature extraction and denoising mechanisms of Transformer and Convolutional Neural Networks (CNN). They face two problems: the self attention mechanism based on local windows fails to fully capture the nonlocal self-similarity in images, and the calculation of self-attention in the channel dimension does not fully utilize the spatial correlation of images. To address these issues, this study proposes a superpixel guided strategy for a window partition-based visual Transformer method; the strategy can adaptively select relevant windows for global interactions. First, a Top-N Cross Attention mechanism (TNCA) is designed based on window interactions, the top N windows that are most similar to the target image window are selected dynamically, and the information related to the image windows in the channel dimension are aggregated, fully considering the nonlocal self-similarity of the image. Second, through superpixel segmentation guidance, the expressive power of local features within the window is significantly improved while enhancing the correlation of spatial features in the channel dimension. Finally, a hierarchical Adaptive Interaction Superpixel Guide Transformer (AISGFormer) is constructed. Experimental results show that AISGFormer achieves a Peak Signal-to-Noise Ratio (PSNR) of 39.98 dB and 40.06 dB on the SIDD and DND real image datasets, respectively. Compared with other advanced networks, the PSNR improves by 0.02 dB—14.33 dB and 0.02 dB—7.63 dB, respectively. AISGFormer interacts with local and global information and details more effectively, and it adaptively utilizes self-similarity to suppress region similarity noise.

Key words: low-light image denoising, Transformer, cross-attention, non-local self-similarity, real image noise, superpixel

摘要: 现有的低光图像去噪方法主要使用Transformer和卷积神经网络(CNN)的特征提取和去噪机制,会面临两个问题:基于局部窗口的自注意力机制未能充分捕捉图像中的非局部自相似性;通道维度上的自注意力计算未充分利用图像的空间关联性。针对上述问题,在基于窗口划分的视觉Transformer方法上提出一种超像素引导的策略,其可以自适应地选择相关窗口进行全局交互。首先,设计基于窗口交互的Top-N交叉注意力机制(TNCA),动态选择与目标图像窗口最相似的前N个窗口,并在通道维度上聚合图像窗口的信息,充分考虑图像非局部自相似性;其次,通过超像素分割引导的方式,显著提升窗口内局部特征的表达力,同时在通道维度上增强空间特征的关联性;最后,构建一个层次化的自适应交互超像素引导的Transformer去噪网络(AISGFormer)。实验结果表明,AISGFormer在SIDD和DND真实图像数据集上的峰值信噪比(PSNR)分别为39.98 dB和40.06 dB,与其他先进网络相比分别提升了0.02 dB~14.33 dB和0.02 dB~7.63 dB,AISGFormer更能交互局部与全局的信息和细节,自适应地利用自相似性来抑制区域相似噪声。

关键词: 低光图像去噪, Transformer, 交叉注意力, 非局部自相似性, 真实图像噪声, 超像素

CLC Number: