Real-Time Scene Segmentation Algorithm for Indoor Service Robot

doi:10.19678/j.issn.1000-3428.0059577

Abstract

Abstract: Real-time scene segmentation in indoor scenes is a key technology required for the development of indoor service robots.Some great advances have been made in the studies of semantic segmentation,but most existing methods tend to use complex network structures or models that improve the accuracy at the price of higher computational cost and deployment cost.To address the limited computational cost of mobile robots,the design of a lightweight bottleneck structure is described,and on this basis a lightweight scene segmentation network is constructed.The network cascades with the feature extraction network to obtain deeper semantic features,and integrates shallow features with deep semantic features to obtain richer image features.Then the network combines depthwise separable convolution and multi-scale dilated convolution to extract multi-scale image features,and reduces the number of parameters and amount of calculation of the model.At the same time,the channel attention mechanism is introduced to improve the accuracy of network segmentation.Experiments are carried out taking 512×512 pixels image as the input,and results show that the MIoU of the proposed algorithm reaches 72.7% on the NYUDv2 indoor scene segmentation dataset and 59.9% on the CamVid dataset,while the amount of calculation cost is only 4.2 GFLOPs and the number of parameters is 8.3 Mb.The algorithm can be deployed on the NVIDIA Jetson XavierNX embedded platform for mobile robots,and achieved 42 frame/s in inference speed,significantly outperforming DeepLabV3+,PSPNet,SegNet and UNet algorithms in real-time performance.

Key words: lightweight network, scene segmentation, depthwise separable convolution, dilated convolution, attention mechanism

摘要： 室内场景下的实时场景分割是开发室内服务机器人的一项关键技术，目前关于语义分割的研究已经取得了重大进展，但是多数方法都倾向于设计复杂的网络结构或者高计算成本的模型来提高精度指标，而忽略了实际的部署成本。针对移动机器人算力成本有限的问题，设计一种轻量化的瓶颈结构，并以此为基本元素构建轻量化场景分割网络。该网络通过与特征提取网络级联获得更深层次的语义特征，并且融合浅层特征与深层语义特征获得更丰富的图像特征，其结合深度可分离卷积与多尺度膨胀卷积提取多尺度图像特征，减少了模型的参数量与计算量，同时利用通道注意力机制提升特征加权时的网络分割精度。以512像素×512像素的图像作为输入进行实验，结果表明，该算法在NYUDv2室内场景分割数据集和CamVid数据集上的MIoU分别达到72.7%和59.9%，模型计算力为4.2 GFLOPs，但参数量仅为8.3 Mb，在移动机器人NVIDIA Jetson XavierNX嵌入式平台帧率可达到42 frame/s，其实时性优于DeepLabV3+、PSPNet、SegNet和UNet算法。

关键词: 轻量化网络, 场景分割, 深度可分离卷积, 膨胀卷积, 注意力机制

CLC Number:

TP391

LIN Jie, CHEN Chunmei, LIU Guihua, ZHU Lijia. Real-Time Scene Segmentation Algorithm for Indoor Service Robot[J]. Computer Engineering, 2021, 47(7): 21-29.

林杰, 陈春梅, 刘桂华, 祝礼佳. 室内服务机器人的实时场景分割算法[J]. 计算机工程, 2021, 47(7): 21-29.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0059577

https://www.ecice06.com/EN/Y2021/V47/I7/21

Figures/Tables 18

References

[1] XU X Y,XU S Z,JIN L H,et al.Characteristic analysis of Otsu threshold and its applications[J].Pattern Recognition Letters,2011,32(7):956-961.
[2] OTSU N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems,Man,and Cybernetics,1979,9(1):62-66.
[3] 杨静,李卫国,郭文霞.基于足球机器人的彩色图像分割方法研究[J].内蒙古工业大学学报(自然科学版),2013,32(2):30-33. YANG J,LI W G,GUO W X.Research on color image segmentation method based on soccer robot[J].Journal of Inner Mongolia University of Technology(Natural Science Edition),2013,32(2):30-33.(in Chinese)
[4] 王乐,纪竟,邓彦松.基于图像分割的水下管道检测机器人设计[J].机器人技术与应用,2017(4):37-40. WANG L,JI J,DENG Y S.Design of underwater pipeline inspection robot based on image segmentation[J].Robot Technique and Application,2017(4):37-40.(in Chinese)
[5] ZHAO Y C,LIN F S,LIU S G,et al.Separate degree based Otsu and signed similarity driven level set for segmenting and counting anthrax spores[J].Computers and Electronics in Agriculture,2020,169:1-5.
[6] 林伟明,胡云堂.基于YUV颜色模型的番茄收获机器人图像分割方法[J].农业机械学报,2012,43(12):176-180. LIN W M,HU Y T.Image segmentation method of tomato harvesting robot based on YUV color model[J].Journal of Agricultural Machinery,2012,43(12):176-180.(in Chinese)
[7] TAO W B,JIN H,ZHANG Y M.Color image segmentation based on mean shift and normalized cuts[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),2007,37(5):1382-1389.
[8] 雷俊,王立辉,何芸倩,等.适用于机器人视觉的图像分割方法[J].系统工程与电子技术,2017,39(7):1653-1659. LEI J,WANG L H,HE Y Q,et al.Image segmentation method suitable for robot vision[J].Journal of Systems Engineering and Electronics,2017,39(7):1653-1659.(in Chinese)
[9] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[10] BADRINARAYANAN V,KENDALL A,CIPOLLA R.SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[11] RONNEBERGER O,FISCHER P,BROX T.U-Net:convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention.Berlin,Germany:Springer,2015:234-241.
[12] CHEN L J,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected CRFs[C]//Proceedings of the 3rd International Conference on Learning Representations.San Diego,USA:[s.n.],2014:1-14.
[13] CHEN L J,PAPANDREOU G,KOKKINOS I,et al.DeepLab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[14] CHEN L J,ZHU Y K,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:801-818.
[15] ZHAO H S,SHI J P,QI X J,et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:2881-2890.
[16] ZHAO Y C,LIN F S,LIU S G,et al.Constrained-focal-loss based deep learning for segmentation of spores[J].IEEE Access,2019,7:165029-165038.
[17] HUANG J J,ZHU Z,HUANG G.Multi-stage HRNet:multiple stage high-resolution network for human pose estimation[EB/OL].(2019-10-14)[2020-10-25].https://arxiv.org/pdf/1910.05901.pdf.
[18] CHOLLET F.Xception:deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1251-1258.
[19] HAN K,WANG Y H,TIAN Q,et al.GhostNet:more features from cheap operations[C]//Proceedings of 2020 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press.2020:1580-1589.
[20] HU J,SHEN L,SUN G,et al.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press.2018:7132-7141.
[21] CHEN L,ZHANG H W,XIAO J,et al.SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press.2017:5659-5667.
[22] GUPTA S,ARBELAEZ P,MALIK J.Perceptual organization and recognition of indoor scenes from RGB-D images[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2013:564-571.
[23] HE K M,GIRSHICK R,DOLLÁR P.Rethinking imageNet pre-training[C]//Proceedings of 2019 IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2019:4918-4927.
[24] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].(2015-07-20)[2020-10-25].https://arxiv.org/pdf/1412.6980v7.pdf.
[25] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:2818-2826.

Please choose a citation manager

Content to export