基于改进自注意力机制的金字塔场景解析网络

doi:10.19678/j.issn.1000-3428.0063652

计算机工程 ›› 2023, Vol. 49 ›› Issue (1): 242-249. doi: 10.19678/j.issn.1000-3428.0063652

基于改进自注意力机制的金字塔场景解析网络

郑秋梅¹, 徐林康¹, 王风华¹, 林超²

1. 中国石油大学(华东) 计算机科学与技术学院, 山东青岛 266580;
2. 中国石油大学(华东) 信息化建设处, 山东青岛 266580

收稿日期:2021-12-29 修回日期:2022-03-06 发布日期:2022-07-04
作者简介:郑秋梅(1964-),女,教授,主研方向为图像处理、目标检测;徐林康,硕士研究生;王风华,讲师、博士;林超,高级工程师、硕士。
基金资助:
国家自然科学基金“基于超声速膨胀的天然气非均质凝结机理”（52074341）；国家自然科学基金“多相流管道泥砂颗粒冲蚀机制研究”（51874340）；中央高校基本科研业务费专项资金（19CX02030A）。

Pyramid Scene Parsing Network Based on Improved Self-Attention Mechanism

ZHENG Qiumei¹, XU Linkang¹, WANG Fenghua¹, LIN Chao²

1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao, Shandong 266580, China;
2. Information Construction Department, China University of Petroleum(East China), Qingdao, Shandong 266580, China

Received:2021-12-29 Revised:2022-03-06 Published:2022-07-04

摘要/Abstract

摘要： 金字塔场景解析网络存在图像细节信息随着网络深度加深而丢失的问题，导致小目标与物体边缘语义分割效果不佳、像素类别预测不够准确。提出一种基于改进自注意力机制的金字塔场景解析网络方法，将自注意力机制的通道注意力模块与空间注意力模块分别加入到金字塔场景解析网络的主干网络和加强特征提取网络中，使网络中的两个子网络能够分别从通道和空间两个方面提取图像中更重要的特征细节信息。针对现有的图像降维算法无法更好地提高自注意力机制计算效率的问题，在分析“词汇”顺序对自注意力机制计算结果影响的基础上，利用希尔伯特曲线遍历设计新的图像降维算法，并将该算法加入到空间自注意力模块中，以提高其计算能力。仿真实验结果表明，该方法在PASCAL VOC 2012和息肉分割数据集上的精度均有提高，小目标与物体边缘分割更加精细，其中在VOC 2012训练集中平均交并比与平均像素精度分别达到75.48%、85.07%，较基准算法分别提升了0.68、1.35个百分点。

关键词: 语义分割, 金字塔场景解析网络, 自注意力机制, 图像降维, 希尔伯特曲线

Abstract: In pyramid Scene Parsing Network(PSPNet), image detail information is lost as the network depth deepens, resulting in poor semantic segmentation of small objects and object edges and inaccurate pixel category prediction.To solve this problem, this paper presents a pyramid scene resolution network method based on an improved self-attention mechanism.The channel and spatial attention modules based on the mechanism are added to the main network of the pyramid scene analysis network and the enhanced feature extraction network, respectively, so that the two sub-networks in the network can extract important feature details from the channel and spatial aspects.Moreover, considering that the current image dimensionality reduction algorithm cannot further improve the calculation effect of the self-attention mechanism, a Hilbert Curve(HC) traversal design is proposed based on analyzing the influence of the order of "words" on the calculation results of the self-attention mechanism.A new image dimensionality reduction algorithm is added to the spatial self-attention module to improve its computing power.The simulation results show that the improved method proposed in this paper has improved accuracy on both PASCAL VOC 2012 and polyp segmentation datasets, and the segmentation of small objects and object edges is more refined. Among them, the average intersection ratio in the VOC 2012 training set reaches 75.48%, which is 0.68 percentage points higher than that of the benchmark algorithm, and the average pixel accuracy reaches 85.07%, which is 1.35 percentage points higher than that of the benchmark algorithm.

Key words: semantic segmentation, Pyramid Scenarios Parse Network(PSPNet), self-attention mechanism, image dimensionality reduction, Hilbert Curve(HC)

中图分类号:

TP391.41

郑秋梅, 徐林康, 王风华, 林超. 基于改进自注意力机制的金字塔场景解析网络[J]. 计算机工程, 2023, 49(1): 242-249.

ZHENG Qiumei, XU Linkang, WANG Fenghua, LIN Chao. Pyramid Scene Parsing Network Based on Improved Self-Attention Mechanism[J]. Computer Engineering, 2023, 49(1): 242-249.

http://www.ecice06.com/CN/Y2023/V49/I1/242

图/表 13

20230701181231

20230701181234

20230701181238

20230701181241

20230701181245

20230701181248

20230701181251

20230701181255

20230701181258

20230701181302

20230701181305

20230701181309

20230701181312

参考文献

[1] 范朝冬, 张英杰, 欧阳红林, 等.基于改进斜分Otsu法的回转窑火焰图像分割[J].自动化学报, 2014, 40(11):2480-2489. FAN C D, ZHANG Y J, OUYANG H L, et al.Improved Otsu method based on histogram oblique segmentation for segmentation of rotary kiln flame image[J].Acta Automatica Sinica, 2014, 40(11):2480-2489.(in Chinese)
[2] BEZDEK J C, EHRLICH R, FULL W.FCM:the fuzzy c-means clustering algorithm[J].Computers & Geosciences, 1984, 10(2/3):191-203.
[3] 丛培盛, 孙建忠.分水岭算法分割显微图像中重叠细胞[J].中国图象图形学报, 2006, 11(12):1781-1783, 1890. CONG P S, SUN J Z.Application of watershed algorithm for segmenting overlapping cells in microscopic image[J].Journal of Image and Graphics, 2006, 11(12):1781-1783, 1890.(in Chinese)
[4] SHELHAMER E, LONG J, DARRELL T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
[5] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[6] ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al.UNet:redesigning skip connections to exploit multiscale features in image segmentation[J].IEEE Transactions on Medical Imaging, 2020, 39(6):1856-1867.
[7] CHENG H K, CHUNG J, TAI Y W, et al.CascadePSP:toward class-agnostic and very high-resolution segmentation via global and local refinement[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:8887-8896.
[8] CHEN W L, ZHU X G, SUN R Q, et al.Tensor low-rank reconstruction for semantic segmentation[EB/OL].[2021-11-10].https://arxiv preprint arxiv:2008.00490.
[9] WANG Q L, WU B G, ZHU P F, et al.ECA-net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:11531-11539.
[10] HUANG Y, JIA W, HE X, et al.CAA:channelized axial attention for semantic segmentation[EB/OL].[2021-11-10].https://arxiv preprint arxiv:2101.07434.
[11] HUANG Z L, WANG X G, WEI Y C, et al.CCNet:criss-cross attention for semantic segmentation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:603-612.
[12] FU J, LIU J, TIAN H J, et al.Dual attention network for scene segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:3141-3149.
[13] LIN H Z, CHENG X, WU X Y, et al.CAT:cross attention in vision transformer[EB/OL].[2021-11-10].https://arxiv preprint arxiv:2106.05786.
[14] VASWANI A.Attention is all you need[EB/OL].[2021-11-10].https://arxiv preprint arxiv:1706.03762.
[15] 叶剑锋, 徐轲, 熊峻峰, 等.基于注意力机制和辅助任务的语义分割算法[J].计算机工程, 2021, 47(9):203-209, 216. YE J F, XU K, XIONG J F, et al.Semantic segmentation algorithm based on attention mechanism and auxiliary task[J].Computer Engineering, 2021, 47(9):203-209, 216.(in Chinese)
[16] HU J, SHEN L, ALBANIE S, et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):2011-2023.
[17] CHOI S, KIM J T, CHOO J.Cars can't fly up in the sky:improving urban-scene segmentation via height-driven attention networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:9370-9380.
[18] WOO S, PARK J, LEE J Y, et al.CBAM:convolutional block attention module[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:3-19.
[19] YUAN Y H, HUANG L, GUO J Y, et al.OCNet:object context for semantic segmentation[J].International Journal of Computer Vision, 2021, 129(8):2375-2398.
[20] GILBERT W J.A cube-filling Hilbert curve[J].The Mathematical Intelligencer, 1984, 6(3):78-79.
[21] SU T Y, WANG W, LÜ Z H, et al.Rapid Delaunay triangulation for randomly distributed point cloud data using adaptive Hilbert curve[J].Computers & Graphics, 2016, 54:65-74.
[22] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:7794-7803.
[23] DOSOVITSKIY A.An image is worth 16×16 words:transformers for image recognition at scale[C]//Proceedings of IEEE ICLRʼ21.Washington D.C., USA:IEEE Press, 2021:325-337.
[24] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[25] LIU Z W, LI X X, LUO P, et al.Semantic image segmentation via deep parsing network[C]//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:1377-1385.

选择文件类型/文献管理软件名称

选择包含的内容

基于改进自注意力机制的金字塔场景解析网络

Pyramid Scene Parsing Network Based on Improved Self-Attention Mechanism

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	徐春波, 闫娟, 杨慧斌, 王博, 吴晗. 基于目标检测和语义分割的视觉SLAM算法[J]. 计算机工程, 2023, 49(8): 199-206, 214.
[2]	刘豪, 吴红兰, 房宇轩. 结合全局上下文信息的高效人体姿态估计[J]. 计算机工程, 2023, 49(7): 102-109.
[3]	白俊卿, 韩柏迅, 张丰侠. 基于深度学习的无人机图像语义分割算法研究[J]. 计算机工程, 2023, 49(4): 233-239.
[4]	马素刚, 陈期梅, 侯志强, 杨小宝, 张子贤. 基于密集连接与特征增强的语义分割算法[J]. 计算机工程, 2023, 49(3): 263-270.
[5]	苏鸣方, 胡立坤, 黄润辉. 基于上下文注意力的室外点云语义分割方法[J]. 计算机工程, 2023, 49(3): 248-256.
[6]	沈学利, 韩倩雯. 基于注意力机制的场感知点击率预测模型[J]. 计算机工程, 2023, 49(3): 80-86,94.
[7]	范润泽, 刘宇红, 张荣芬, 李景玉. 基于多尺度注意力机制的道路场景语义分割模型[J]. 计算机工程, 2023, 49(2): 288-295.
[8]	刘杭, 殷歆, 陈杰, 罗恒. 基于混合网络模型的多维时间序列预测[J]. 计算机工程, 2023, 49(1): 121-129.
[9]	曾雷鸣, 侯进, 陈子锐, 周浩然. 基于弱语义分割的轻量化交通标志检测网络[J]. 计算机工程, 2022, 48(9): 269-276,285.
[10]	高庆吉, 李天昊, 邢志伟, 刘佩佩. 基于区块特征融合的点云语义分割方法[J]. 计算机工程, 2022, 48(9): 37-44,54.
[11]	赵国川, 王姮, 张华, 庞杰, 周建. 基于完全自注意力的水电枢纽缺陷识别方法[J]. 计算机工程, 2022, 48(9): 277-285.
[12]	吴迪, 王梓宇, 赵伟超. ELMo-CNN-BiGRU双通道文本情感分类模型[J]. 计算机工程, 2022, 48(8): 105-112.
[13]	田乐, 王欢. 引入独立融合分支的双模态语义分割网络[J]. 计算机工程, 2022, 48(8): 240-248,257.
[14]	胡新荣, 龚闯, 张自力, 朱强, 彭涛, 何儒汉. 基于改进Deeplab v3+的服装图像分割网络[J]. 计算机工程, 2022, 48(7): 284-291.
[15]	陈可嘉, 刘惠. 基于改进BiGRU-CNN的中文文本分类方法[J]. 计算机工程, 2022, 48(5): 59-66,73.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于改进自注意力机制的金字塔场景解析网络

Pyramid Scene Parsing Network Based on Improved Self-Attention Mechanism

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献

相关文章 15

编辑推荐

Metrics

本文评价