作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (1): 242-249. doi: 10.19678/j.issn.1000-3428.0063652

• 图形图像处理 • 上一篇    下一篇

基于改进自注意力机制的金字塔场景解析网络

郑秋梅1, 徐林康1, 王风华1, 林超2   

  1. 1. 中国石油大学(华东) 计算机科学与技术学院, 山东 青岛 266580;
    2. 中国石油大学(华东) 信息化建设处, 山东 青岛 266580
  • 收稿日期:2021-12-29 修回日期:2022-03-06 发布日期:2022-07-04
  • 作者简介:郑秋梅(1964-),女,教授,主研方向为图像处理、目标检测;徐林康,硕士研究生;王风华,讲师、博士;林超,高级工程师、硕士。
  • 基金资助:
    国家自然科学基金“基于超声速膨胀的天然气非均质凝结机理”(52074341);国家自然科学基金“多相流管道泥砂颗粒冲蚀机制研究”(51874340);中央高校基本科研业务费专项资金(19CX02030A)。

Pyramid Scene Parsing Network Based on Improved Self-Attention Mechanism

ZHENG Qiumei1, XU Linkang1, WANG Fenghua1, LIN Chao2   

  1. 1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao, Shandong 266580, China;
    2. Information Construction Department, China University of Petroleum(East China), Qingdao, Shandong 266580, China
  • Received:2021-12-29 Revised:2022-03-06 Published:2022-07-04

摘要: 金字塔场景解析网络存在图像细节信息随着网络深度加深而丢失的问题,导致小目标与物体边缘语义分割效果不佳、像素类别预测不够准确。提出一种基于改进自注意力机制的金字塔场景解析网络方法,将自注意力机制的通道注意力模块与空间注意力模块分别加入到金字塔场景解析网络的主干网络和加强特征提取网络中,使网络中的两个子网络能够分别从通道和空间两个方面提取图像中更重要的特征细节信息。针对现有的图像降维算法无法更好地提高自注意力机制计算效率的问题,在分析“词汇”顺序对自注意力机制计算结果影响的基础上,利用希尔伯特曲线遍历设计新的图像降维算法,并将该算法加入到空间自注意力模块中,以提高其计算能力。仿真实验结果表明,该方法在PASCAL VOC 2012和息肉分割数据集上的精度均有提高,小目标与物体边缘分割更加精细,其中在VOC 2012训练集中平均交并比与平均像素精度分别达到75.48%、85.07%,较基准算法分别提升了0.68、1.35个百分点。

关键词: 语义分割, 金字塔场景解析网络, 自注意力机制, 图像降维, 希尔伯特曲线

Abstract: In pyramid Scene Parsing Network(PSPNet), image detail information is lost as the network depth deepens, resulting in poor semantic segmentation of small objects and object edges and inaccurate pixel category prediction.To solve this problem, this paper presents a pyramid scene resolution network method based on an improved self-attention mechanism.The channel and spatial attention modules based on the mechanism are added to the main network of the pyramid scene analysis network and the enhanced feature extraction network, respectively, so that the two sub-networks in the network can extract important feature details from the channel and spatial aspects.Moreover, considering that the current image dimensionality reduction algorithm cannot further improve the calculation effect of the self-attention mechanism, a Hilbert Curve(HC) traversal design is proposed based on analyzing the influence of the order of "words" on the calculation results of the self-attention mechanism.A new image dimensionality reduction algorithm is added to the spatial self-attention module to improve its computing power.The simulation results show that the improved method proposed in this paper has improved accuracy on both PASCAL VOC 2012 and polyp segmentation datasets, and the segmentation of small objects and object edges is more refined. Among them, the average intersection ratio in the VOC 2012 training set reaches 75.48%, which is 0.68 percentage points higher than that of the benchmark algorithm, and the average pixel accuracy reaches 85.07%, which is 1.35 percentage points higher than that of the benchmark algorithm.

Key words: semantic segmentation, Pyramid Scenarios Parse Network(PSPNet), self-attention mechanism, image dimensionality reduction, Hilbert Curve(HC)

中图分类号: