作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 329-338. doi: 10.19678/j.issn.1000-3428.0067208

• 开发研究与工程应用 • 上一篇    下一篇

融合多层感知注意力的电极微观图像分割方法

徐威1,2, 付晓薇1,2,*(), 李曦3, 汪尧坤1,2   

  1. 1. 武汉科技大学计算机科学与技术学院,湖北 武汉 430065
    2. 智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065
    3. 华中科技大学人工智能与自动化学院,湖北 武汉 430074
  • 收稿日期:2023-03-20 出版日期:2024-01-15 发布日期:2024-01-11
  • 通讯作者: 付晓薇
  • 基金资助:
    国家自然科学基金(61873323); 国家自然科学基金(U2066202); 广东省重点研发计划项目(2022B0111130004); 深圳科技创新基础研究重点项目(JCYJ20210324115606017)

Electrode Microscopic Image Segmentation Method by Fusing Multi-layer Perceptual Attention

Wei XU1,2, Xiaowei FU1,2,*(), Xi LI3, Yaokun WANG1,2   

  1. 1. College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, Hubei, China
    2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, Hubei, China
    3. School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
  • Received:2023-03-20 Online:2024-01-15 Published:2024-01-11
  • Contact: Xiaowei FU

摘要:

针对氮氧传感器电极微观图像存在的物质边缘模糊、伪影、灰度不均等问题,将U-Net作为基础模型,提出融合多层感知注意力的电极微观图像语义分割方法。首先对U-Net编码层的不同尺度输出特征图使用3×3卷积进行降维,利用双线性插值统一特征尺度,以实现多尺度特征融合,增强特征信息提取能力并补偿编码下采样中的特征损失;其次通过加入空间金字塔池化来提取多尺度信息并通过1×1卷积减小计算量,同时提出多层感知注意力模块,以捕获主干特征图和增强语义信息特征图的空间位置与通道依赖关系;最后计算不同语义信息特征图的相似度关系,结合交叉熵损失提出具有捕获空间相似性能力的损失函数,在训练过程中对关键信息进行监督,辅助主干特征图学习空间位置信息,增强分割性能。实验结果表明,该方法的类别平均像素准确率为96.75%,平均交并比为94.04%,微观F1分数为96.92%,浮点运算次数为7.78×109,网络所含参数量为8.08×106。相对U-Net、SegNet等模型,该方法在提高少量模型复杂度的情况下,能有效改善边缘模糊及物质伪影问题,捕获空间位置与通道信息,保留图像细节特征,提高分割准确率。

关键词: 电极, 微观图像, 氮氧传感器, 语义分割, 感知注意力

Abstract:

To address the problems of blurred material edges, artifacts, and uneven grayscale in electrode microscopic images of NOx sensors, an electrode microscopic image semantic segmentation method that fuses multi-layer perceptual attention is proposed, in which U-Net is the base model. First, different scale output feature maps of the U-Net encoding layer with a 3×3 convolution are used to reduce dimensionality. Furthermore, bilinear interpolation is used to unify feature scales to achieve multi-scale feature fusion, enhance feature information extraction, and compensate for feature loss from encoding downsampling. Second, by adding spatial pyramid pooling to extract multi-scale information and employing a 1×1 convolution to reduce the calculation, a multi-layer perceptual attention module is proposed to capture the spatial position and channel dependence of the backbone feature map and the feature map with enhanced semantic information. Finally, a loss function with the ability to capture spatial similarity is proposed based on the similarity relationship of feature maps with different semantic information combined with cross-entropy loss. The key information is supervised during the training process to assist the backbone feature map to learn spatial position information and enhance the segmentation performance. The experimental results indicate that the Mean Pixel Accuracy(MPA) of the proposed method is 96.75%, the Mean Intersection over Union(MIoU) is 94.04%, Micro-F1 is 96.92%, FLOPs is 7.78×109, and the number of parameters contained in the network is 8.08×106. Compared with models such as U-Net and SegNet, the proposed method can effectively address problems of edge blurring and material artifacts while increasing a little model complexity. Furthermore, it can capture spatial position and channel information, preserve detailed features of the image, and improve segmentation accuracy.

Key words: electrode, microscopic image, NOx sensor, semantic segmentation, perceptual attention