基于注意力机制和辅助任务的语义分割算法

doi:10.19678/j.issn.1000-3428.0058447

摘要/Abstract

摘要： 为提高网络模型低层特征的离散度和语义分割算法的性能，以全卷积神经网络作为基础模型，提出一种基于辅助损失、边缘检测辅助任务和注意力机制的语义分割算法。通过重新设计网络模型的辅助损失分支，使网络低层特征编码更多语义信息。在多任务学习中，选择边缘检测作为辅助任务，基于注意力机制设计边缘检测的辅助任务分支，使网络模型更关注物体的形状和边缘信息。在此基础上，将基础模型、辅助损失分支、辅助任务分支集成构造为语义分割模型。在VOC2012数据集上的实验结果表明，该算法的平均交并比为71.5%，相比基础模型算法提高了6个百分点。

关键词: 注意力机制, 辅助任务, 辅助损失, 多任务学习, 语义分割

Abstract: When applied to semantic segmentation, the existing convolutional neural network models suffer from the low dispersion of low-level features, and thus reduce the performance of semantic segmentation algorithms.To address the problem, a basic fully convolutional neural network model is redesigned.On this basis, a novel semantic segmentation algorithm based on auxiliary loss, auxiliary edge detection tasks and attention mechanism is proposed.The auxiliary loss branch of the neural network model is redesigned to allow the low-level features to encode more semantic information.Then in multi-task learning, edge detection is chosen as the auxiliary task.The auxiliary task branch is designed based on the attention mechanism for edge detection to allow the network model pay more attention to the shape and edge information of objects.Finally, the basic model, auxiliary loss branch and auxiliary task branch are integrated into the semantic segmentation model.The experimental results on the VOC2012 dataset show that the proposed algorithm improves the mean intersection-over-union to 71.5%, outperforming the basic model algorithm by 6 percentage point.

Key words: attention mechanism, auxiliary task, auxiliary loss, multi-task learning, semantic segmentation

中图分类号:

TP391

叶剑锋, 徐轲, 熊峻峰, 王化明. 基于注意力机制和辅助任务的语义分割算法[J]. 计算机工程, 2021, 47(9): 203-209,216.

YE Jianfeng, XU Ke, XIONG Junfeng, WANG Huaming. Semantic Segmentation Algorithm Based on Attention Mechanism and Auxiliary Task[J]. Computer Engineering, 2021, 47(9): 203-209,216.

https://www.ecice06.com/CN/Y2021/V47/I9/203

图/表 13

20210917191947

20210917191951

20210917191955

20210917191958

20210917192002

20210917192006

20210917192009

20210917192013

20210917192016

20210917192020

20210917192024

20210917192027

20210917192031

参考文献

[1] LECUN Y, BENGIO Y, HINTON G E, et al.Deep learning[J].Nature, 2015, 521(7553):436-444.
[2] OSTU N.A threshold selection method from Gray-Level histograms[J].IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1):62-66.
[3] CANNY J F.A computational approach to edge detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6):679-698.
[4] XU X W, ESTER M, KRIEGEL H P, et al.A distribution-based clustering algorithm for mining in large spatial databases[C]//Proceedings of the 14th International Conference on Data Engineering.Washington D.C., USA:IEEE Press, 1998:324-331.
[5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60:84-90.
[6] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-04-15].http://arxiv.org/abs/1409.1556.
[7] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[8] SHELHAMER E, LONG J, DARRELL T, et al.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
[9] HE K M, GKIOXARI G, DOLLÁR P, et al.Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2980-2988.
[10] REN S Q, HE K M, GIRSHICK R B, et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[11] CHEN L C, YANG Y, WANG J, et al.Attention to scale:scale-aware semantic image segmentation[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:3640-3649.
[12] SZEGEDY C, LIU W, JIA Y Q, et al.Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1-9.
[13] 杨雪, 范勇, 高琳, 等.基于纹理基元块识别与合并的图像语义分割[J].计算机工程, 2015, 41(3):253-257. YANG X, FAN Y, GAO L, et al.Image semantic segmentation based on texture element block recognition and merging[J].Computer Engineering, 2015, 41(3):253-257.(in Chinese)
[14] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2017:5998-6008.
[15] BADRINARAYANAN V, KENDALL A, CIPOLLA R, et al.SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[16] RONNEBERGER O, FISCHER P, BROX T.U-Net:convolutional networks for biomedical image segmentation[C]//Proceedings of the Medical Image Computing and Computer-Assisted Intervention.Berlin, Germany:Springer, 2015:234-241.
[17] CHEN L C, PAPANDREOU G, KOKKINOS I, et al.DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
[18] LIU W, RABINOVICH A, BERG A C, et al.ParseNet:looking wider to see better[EB/OL].[2020-04-13].http://arxiv.org/abs/1506.04579v2.
[19] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[20] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[21] MARVIN T, MICHAEL W, ZÖLLNER M, et al.MultiNet:real-time joint semantic reasoning for autonomous driving[C]//Proceedings of 2018 IEEE Intelligent Vehicles Symposium.Washington D.C., USA:IEEE Press, 2018:1013-1020.
[22] ISHAN M, ABHINAV S, GUPTA A, et al.Cross-Stitch networks for multi-task learning[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:3994-4003.
[23] MCCANN B, KESKAR N S, XIONG C, et al.The natural language decathlon:multitask learning as question answering[EB/OL].[2020-04-13].http://arxiv.org/abs/1806.08730.
[24] ZHANG Z P, LUO P, LOY C C, et al.Facial landmark detection by deep multi-task learning[C]//Proceedings of the European Conference on Computer Vision.Berlin, Germany:Springer, 2014:94-108.
[25] HU H, GU J Y, ZHANG Z, et al.Relation networks for object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3588-3597.
[26] HU J, LI S, ALBANIE S, et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):7132-7141.
[27] XIE S, ROSS G, DOLLÁR P, et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5987-5995.
[28] OYEDOTUN O K, SHABAYEK A R, AOUADA D, et al.Improved highway network block with gates constraints for training very deep networks[J].IEEE Access, 2020, 8:176758-176773.
[29] HOLMSTROM L, KOISTINEN P.Using additive noise in back-propagation training[J].IEEE Transactions on Neural Networks, 1992, 3(1):24-38.
[30] 王囡, 侯志强, 赵梦琦, 等. 结合边缘检测的语义分割算法[J]. 计算机工程, 2021, 47(7):257-265. WANG N, HOU Z Q, ZHAO M Q, et al. Semantic segmentation algorithm combined with edge detection[J]. Computer Engineering, 2021, 47(7):257-265. (in Chinese)
[31] HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[EB/OL].[2020-04-05]. https://arxiv.org/pdf/1603.05027.pdf.

选择文件类型/文献管理软件名称

选择包含的内容