作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 249-258. doi: 10.19678/j.issn.1000-3428.0068267

• 图形图像处理 • 上一篇    下一篇

基于Swin-Transformer的黑色素瘤图像病灶分割研究

赵宏, 王枭*()   

  1. 兰州理工大学计算机与通信学院, 甘肃 兰州 730050
  • 收稿日期:2023-08-21 出版日期:2024-08-15 发布日期:2024-08-09
  • 通讯作者: 王枭
  • 基金资助:
    国家自然科学基金(62166025); 甘肃省重点研发计划(21YF5GA073)

Study on Lesion Segmentation of Melanoma Images Based on Swin-Transformer

Hong ZHAO, Xiao WANG*()   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, Gansu, China
  • Received:2023-08-21 Online:2024-08-15 Published:2024-08-09
  • Contact: Xiao WANG

摘要:

黑色素瘤图像病灶分割的主流模型大多基于卷积神经网络(CNN)或Vision Transformer(ViT)网络, 但是CNN模型受限于感受野大小, 无法获取全局上下文信息, 而ViT模型只能提取固定分辨率的特征, 无法提取不同粒度的特征。为解决该问题, 建立一种基于Swin-Transformer的融合双分支的混合模型SwinTransFuse。在编码阶段, 首先利用Noise Reduction图像降噪模块去除图像中的毛发等噪声, 然后采用CNN和Swin-Transformer构成的双分支特征提取模块来提取图像的局部细粒度信息和全局上下文信息, 并对来自Swin-Transformer分支的全局上下文信息使用SE模块进行通道注意力操作以增强全局特征的提取, 对来自CNN分支的局部细粒度信息使用卷积块注意力机制模块(CBAM)进行空间注意力操作以增强局部细粒度特征的提取, 接下来利用Hadamard积运算对两个分支输出的特征进行特征交互以实现特征的融合, 最后将SE模块输出的特征、CBAM模块输出的特征和特征融合后的特征进行拼接以实现多层次特征融合, 并通过一个残差块输出交互后的特征。在解码阶段, 将特征输入到上采样模块得到图像最终的分割结果。实验结果表明, 该模型在ISIC2017和ISIC2018皮肤病数据集上的平均交并比分别为78.72%和78.56%, 优于同类型的其他医学分割模型, 具有更高的实用价值。

关键词: Swin-Transformer模型, 黑色素瘤, 特征融合, 降噪, ISIC2018数据集

Abstract:

The mainstream models for lesion segmentation in melanoma images are mostly based on Convolutional Neural Networks (CNN) or Vision Transformer(ViT) networks. However, CNN models are limited by the sizes of receptive fields and cannot obtain global contextual information, and ViT models can only extract fixed resolution features and cannot extract features of different granularities. To solve this problem, a hybrid model, namely, SwinTransFuse, which is based on the Swin-Transformer, is established. This model integrates two branches. In the encoding stage, a Noise Reduction image denoising module is used to remove noise, such as hair, from the image. Then, a dual branch feature extraction module composed of a CNN and Swin-Transformer is used to extract the local fine-grained information and global context information of the image. SE modules are used to perform channel attention operations on the global context information from the Swin-Transformer branch to enhance global feature extraction, and a CBAM module is used for spatial attention operations on local fine-grained information from CNN branches to enhance the extraction of local fine-grained features. Next, the Hadamard product operation is used to perform feature interactions on the output features of the two branches to achieve feature fusion. Finally, the features output by the SE block, features output by the CBAM module, and fused features are concatenated to achieve multilevel feature fusion, and the interactive features are output through a residual block. In the decoding stage, the features are input into an upsampling module to obtain the final image segmentation result. The experimental results show that the mean Intersection over Union (mIoU) values of this model on the ISIC2017 and ISIC2018 skin disease datasets are 78.72% and 78.56%, respectively, which are superior to those of other medical segmentation models of the same type and therefore have a higher practical value.

Key words: Swin-Transformer model, melanoma, feature fusion, noise reduction, ISIC2018 dataset