Clothing Image Segmentation Network Based on Improved Deeplab v3+

doi:10.19678/j.issn.1000-3428.0062392

Abstract

Abstract: To solve the problems of rough clothing edge segmentation, unsatisfactory segmentation accuracy, and insufficient deep semantic feature extraction in clothing image segmentation, the Coordinate Attention(CA) mechanism and Semantic Feature Enhancement Module(SFEM) are embedded into the Deeplab v3+ network, whichfeatures good semantic segmentation performance, and a CA_SFEM_Deeplab v3+ network is proposed for clothing image segmentation in this study.To strengthen the learning of effective features in clothing images, the CA mechanism module is embedded into resnet101, which is the backbone network of the Deeplab v3+ network, and the feature map after convolution pooling is performed on a pyramid with holes is input into the SFEM for feature enhancement.Consequently, the segmentation accuracy improved.Experimental results show that the mean Intersection over Union(mIoU) and Mean Pixel Accuracy(MPA) of the CA_SFEM_Deeplabv3 + network are 0.557 and 0.671, respectively, in the DeepFashion2 dataset, which are 2.1% and 2.3% higher than those of the Deeplab v3 + network, respectively.Compared with the Deeplab v3+ network, the proposedCA_SFEM_Deeplab v3+offersa finer segmentation of the clothing contour and better segmentation performance.

Key words: clothing image, semantic segmentation, Deeplab v3+ network, Coordinate Attention mechanism, semantic feature enhancement module

摘要： 在服装图像分割领域，现有算法存在服装边缘分割粗糙、分割精度差和服装深层语义特征提取不够充分等问题。将Coordinate Attention机制和语义特征增强模块（SFEM）嵌入到语义分割性能较好的Deeplab v3+网络，设计一种用于服装图像分割领域的CA_SFEM_Deeplab v3+网络。为了加强服装图像有效特征的学习，在Deeplab v3+网络的主干网络resnet101中嵌入Coordinate Attention机制，并将经过带空洞卷积池化金字塔网络的特征图输入到语义特征增强模块中进行特征增强处理，从而提高分割的准确率。实验结果表明，CA_SFEM_Deeplab v3+网络在DeepFashion2数据集上的平均交并比与平均像素准确率分别为0.557、0.671，相较于Deeplab v3+网络分别提高2.1%、2.3%，其所得分割服装轮廓更为精细，具有较好的分割性能。

关键词: 服装图像, 语义分割, Deeplab v3+网络, Coordinate Attention机制, 语义特征增强模块

CLC Number:

TP391.41

HU Xinrong, GONG Chuang, ZHANG Zili, ZHU Qiang, PENG Tao, HE Ruhan. Clothing Image Segmentation Network Based on Improved Deeplab v3+[J]. Computer Engineering, 2022, 48(7): 284-291.

胡新荣, 龚闯, 张自力, 朱强, 彭涛, 何儒汉. 基于改进Deeplab v3+的服装图像分割网络[J]. 计算机工程, 2022, 48(7): 284-291.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062392

http://www.ecice06.com/EN/Y2022/V48/I7/284

Figures/Tables 10

References

[1] YAMAGUCHI K, KIAPOUR M H, ORTIZ L E, et al.Parsing clothing in fashion photographs[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2012:3570-3577.
[2] LIU S, FENG J S, DOMOKOS C, et al.Fashion parsing with weak color-category labels[J].IEEE Transactions on Multimedia, 2014, 16(1):253-265.
[3] JI J, YANG R Y.An improved clothing parsing method emphasizing the clothing with complex texture[C]//Proceedings of Conference on Advances in Multimedia Information Processing.Berlin, German:Springer, 2017:487-496.
[4] LONG J, SHELHAMER E, DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:3431-3440.
[5] RONNEBERGER O, FISCHER P, BROX T.U-net:convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention.Washington D.C., USA:IEEE Press, 2015:234-241.
[6] MARMANIS D, SCHINDLER K, WEGNER J D, et al.Classification with an edge:improving semantic image segmentation with boundary detection[J].ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 135:158-172.
[7] ZHAO H S, SHI J P, QI X J, et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6230-6239.
[8] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[9] LIN G S, MILAN A, SHEN C H, et al.RefineNet:multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5168-5177.
[10] 白美丽, 万韬阮, 汤汶, 等.一种改进的用于服装解析的自监督网络学习方法[J].纺织高校基础科学学报, 2019(4):385-392, 410. BAI M L, WAN T R, TANG W, et al.An improved self-supervised neural network learning method for clothing parsing[J].Basic Sciences Journal of Textile Universities, 2019(4):385-392, 410.(in Chinese)
[11] FU J, LIU J, TIAN H J, et al.Dual attention network for scene segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3141-3149.
[12] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[13] WOO S, PARK J, LEE Y J, et al.CBAM:convolutional block attention module[C]//Proceedings of European Conference on Computer Vision.Berlin, German:Springer, 2018:3-19.
[14] CHEN L C.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of European Conference on Computer Vision.Berlin, German:Springer, 2018:801-818.
[15] XU K, BA J, KIROS R, et al.Show, attend and tell:neural image caption generation with visual attention[C]//Proceedings of International Conference on Machine Learning.New York, USA:ACM Press, 2015:2048-2057.
[16] VOLODYMYR M, HEESS N, GRAVES A.Recurrent models of visual attention[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge, USA:MIT Press, 2014:2204-2212.
[17] WOO S, PARK J, LEE J Y, et al.CBAM:convolutional block attention module[C]//Proceedings of European Conference on Computer Vision.Berlin, German:Springer, 2018:3-19.
[18] CAO Y, XU J R, LIN S, et al.GCNet:non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:1971-1980.
[19] LIU J J, HOU Q B, CHENG M M, et al.Improving convolutional networks with self-calibrated convolutions[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:10093-10102.
[20] FU J, LIU J, TIAN H J, et al.Dual attention network for scene segmentation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3141-3149.
[21] HOU Q B, ZHANG L, CHENG M M, et al.Strip pooling:rethinking spatial pooling for scene parsing[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:4002-4011.
[22] TSOTSOS J K.Analyzing vision at the complexity level[J].Behavioral and Brain Sciences, 1990, 13(3):423-445.
[23] 黄文明, 卫万成, 张健, 等.基于注意力机制与评论文本深度模型的推荐方法[J].计算机工程, 2019, 45(9):176-182. HUANG W M, WEI W C, ZHANG J, et al.Recommendation method based on attention mechanism and review text deep model[J].Computer Engineering, 2019, 45(9):176-182.(in Chinese)
[24] HOU Q B, ZHOU D Q, FENG J S.Coordinate attention for efficient mobile network design[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:13708-13717.
[25] PATEL K, BUR A M, WANG G H.Enhanced U-net:a feature enhancement network for polyp segmentation[C]//Proceedings of the 18th Conference on Robots and Vision.Washington D.C., USA:IEEE Press, 2021:181-188.
[26] HE X, YANG S B, LI G B, et al.Non-local context encoder:robust biomedical image segmentation against adversarial attacks[C]//Proceedings of AAAI Conference on Artificial Intelligence.[S.l.]:AAAI Press, 2019:8417-8424.
[27] LIU J J, HOU Q B, CHENG M M, et al.A simple pooling-based design for real-time salient object detection[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3912-3921.
[28] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803.
[29] GE Y Y, ZHANG R M, WANG X G, et al.DeepFashion2:a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:5332-5340.
[30] LIU Z W, LUO P, QIU S, et al.DeepFashion:powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:1096-1104.
[31] ZHENG S, YANG F, KIAPOUR M H, et al.ModaNet:a large-scale street fashion dataset with polygon annotations[EB/OL].[2021-07-10].https://arxiv.org/abs/1807. 01394.
[32] Fashionaidataset[EB/OL].[2021-07-10].http://fashionai.alibaba.com/datasets/.
[33] WU H K, ZHANG J G, HUANG K Q, et al.FastFCN:rethinking dilated convolution in the backbone for semantic segmentation[EB/OL].[2021-07-10].https://arxiv.org/abs/1903.11816.

Please choose a citation manager

Content to export