作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 162-170. doi: 10.19678/j.issn.1000-3428.0065985

• 图形图像处理 • 上一篇    下一篇

融合多尺度语义和剩余瓶颈注意力的医学图像分割

徐蓬泉, 梁宇翔, 李英   

  1. 青岛大学 计算机科学技术学院, 山东 青岛 266071
  • 收稿日期:2022-10-13 出版日期:2023-10-15 发布日期:2023-10-10
  • 作者简介:

    徐蓬泉(1996—),男,硕士,主研方向为深度学习

    梁宇翔,硕士

    李英,副教授、博士

  • 基金资助:
    国家自然科学基金(61802216); 中国博士后科学基金(2018M642613); 山东省自然科学基金(ZR2017PF013)

Medical Image Segmentation Fusing Multi-Scale Semantic and Residual Bottleneck Attention

Pengquan XU, Yuxiang LIANG, Ying LI   

  1. College of Computer Science and Technology, Qingdao University, Qingdao 266071, Shandong, China
  • Received:2022-10-13 Online:2023-10-15 Published:2023-10-10

摘要:

在实际应用中U-Net由于使用单一卷积核以及跳跃连接运算时编解码器间存在语义差距,导致分割不同类型的医学图像时泛化性能降低。鉴于此,基于U-Net结构构建一种轻量灵活的医学图像分割模型(LFUNet)。在编码器和解码器上,构建多尺度语义(MS)模块,每个MS模块使用不同的小卷积核序列等价代替较大的卷积核进行卷积运算,获得不同的感受野,从而捕获不同层次的语义特征。建立集成剩余瓶颈结构和注意力机制的剩余瓶颈注意力(RBA)模块,跳跃连接嵌入RBA模块后能缩小编码器和解码器的语义差距,且使模型更关注目标区域。MS模块的小卷积核序列和RBA模块的逆残差结构具有较少的参数量,从而使LFUNet的总参数量仅为U-Net的1/3,大幅降低了模型复杂度并提高了网络运行效率。在4个公共生物医学图像数据集上的对比实验结果表明,LFUNet的Jaccard系数均值相比于U-Net分别提高了3.184 6、11.936 6、4.243 8、0.114 4个百分点,具有更高的分割精度及泛化性能。

关键词: 深度学习, 语义分割, U-Net结构, 剩余瓶颈结构, 注意力机制

Abstract:

In practical applications, U-Net has a semantic gap between encoder and decoder when using a single convolution kernel and hopping join operations, resulting in reduced generalization performance in the segmentation of different types of medical images. Accordingly, a lightweight and flexible medical image segmentation model based on U-Net structure(LFUNet) is constructed. For encoder and decoder, a Multi-scale Semantic(MS) module is designed, whereby each MS module uses a different small convolutional kernel sequence equivalence instead of a larger convolution kernel for convolution operations to obtain different receptive fields and capture different levels of semantic features. A Residual Bottleneck Attention(RBA) module is established that integrates the residual bottleneck structure and attention mechanism, and the hopping connection can narrow the semantic gap between encoder and decoder after embedding the RBA module, allowing the model to focus on the target region. The small convolution kernel sequence of the MS module and the inverse residual structure of the RBA module have fewer parameters, such that the total number of parameters of LFUNet is only 1/3 of U-Net, which greatly reduces the complexity of the model and improves network operation efficiency. The comparative experimental results on four public biomedical image datasets show that the Jaccard coefficients of LFUNet increased by 3.184 6, 11.936 6, 4.243 8, and 0.114 4 percentage points compared with those of U-Net, exhibiting higher segmentation accuracy and generalization performance.

Key words: deep learning, semantic segmentation, U-Net structure, residual bottleneck structure, attention mechanism