基于深度感知特征提取的室内场景理解

doi:10.19678/j.issn.1000-3428.0058091

摘要/Abstract

摘要： 从深度图RGB-D域中联合学习RGB图像特征与3D几何信息有利于室内场景语义分割，然而传统分割方法通常需要精确的深度图作为输入，严重限制了其应用范围。提出一种新的室内场景理解网络框架，建立基于语义特征与深度特征提取网络的联合学习网络模型提取深度感知特征，通过几何信息指导的深度特征传输模块与金字塔特征融合模块将学习到的深度特征、多尺度空间信息与语义特征相结合，生成具有更强表达能力的特征表示，实现更准确的室内场景语义分割。实验结果表明，联合学习网络模型在NYU-Dv2与SUN RGBD数据集上分别取得了69.5%与68.4%的平均分割准确度，相比传统分割方法具有更好的室内场景语义分割性能及更强的适用性。

关键词: 语义特征, 深度特征, 特征融合, 室内场景理解, 几何信息, 深度感知特征

Abstract: The semantic segmentation for indoor scenes can be improved by the joint learning of RGB image features and 3D geometric information from RGB-D domain.However, the traditional segmentation methods require precise depth maps as the inputs, which seriously limits their application.To address the problem, this paper proposes a new network framework for indoor scene understanding.Based on the network for semantic feature and depth feature extraction, a joint learning network model is built to extract the depth-aware features.Additionally, the proposed model effectively combines learned depth features, multi-scale spatial information and the semantic features to generate more representative features, implementing more accurate semantic segmentation for indoor scenes.Experimental results show that the average segmentation accuracy of the proposed joint learning network model reaches 69.5% on NYU-Dv2 and 68.4% on SUN RGBD.The model provides better semantic segmentation performance and applicability for indoor scenes than traditional segmentation methods.

Key words: semantic feature, depth feature, feature fusion, indoor scene understanding, geometric information, depth-aware feature

中图分类号:

TN183

陈苏婷, 张良臣. 基于深度感知特征提取的室内场景理解[J]. 计算机工程, 2021, 47(6): 217-224.

CHEN Suting, ZHANG Liangchen. Indoor Scene Understanding Based on Depth-Aware Feature Extraction[J]. Computer Engineering, 2021, 47(6): 217-224.

http://www.ecice06.com/CN/Y2021/V47/I6/217

图/表 9

20210618184244

20210618184247

20210618184251

20210618184254

20210618184257

20210618184301

20210618184304

20210618184308

20210618184311

参考文献

[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G.ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2012:1097-1105.
[2] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-03-04].https://arxiv.org/abs/1409.1556.
[3] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[4] CHENG Yanhua,CAI Rui,LI Zhiwei,et al.Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:712-727.
[5] GUPTA S,GIRSHICK R,ARBELAEZ P,et al.Learning rich features from RGB-D images for object detection and segmentation[EB/OL].[2020-03-04].https://arxiv.org/abs/1407.5736.
[6] LU Liangfeng,XIE Zhijun,YE Hongwu.Object recognition algorithm based on RGB feature and depth feature fusing[J].Computer Engineering,2016,42(5):186-193.(in Chinese)卢良锋,谢志军,叶宏武.基于RGB特征与深度特征融合的物体识别算法[J].计算机工程,2016,42(5):186-193.
[7] SILBERMAN N,HOIEM D,KOHLI P,et al.Indoor segmentation and support inference from RGB-D images[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2012:192-206.
[8] REN X F,BO L F,FOX D.RGB-D scene labeling:features and algorithms[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2012:2759-2766.
[9] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:3203-3217.
[10] WANG Peng,SHEN Xiaohui,LIN Zhe,et al.Towards unified depth and semantic prediction from a single image[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:512-517.
[11] EIGEN D,FERGUS R.Predicting depth,surface normals and semantic labels with a common multi-scale convolutional architecture[EB/OL].[2020-03-04].https://arxiv.org/abs/1411.4734.
[12] LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:319-337.
[13] SONG S R,LICHTENBERG S,XIAO J X.SUN RGBD:a RGB-D scene understanding benchmark suite[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:634-649.
[14] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for dense object detection[C]//Proceedings of International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:1178-1201.
[15] LAINA I,RUPPRECHT C,BELAGIANNIS V,et al.Deeper depth prediction with fully convolutional residual networks[EB/OL].[2020-03-04].https://arxiv.org/abs/1606.00373.
[16] GUPTA S,ARBELAEZ P,MALIK J.Perceptual organization and recognition of indoor scenes from RGB-D images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2013:429-447.
[17] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Delving deep into rectifiers:surpassing human-level performance on imagenet classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:2367-2382.
[18] KINGMA D P,BA J.Adam:a method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations.New York,USA:ACM Press,2014:1387-1407.
[19] LIN G S,MILAN A,SHEN C H,et al.RefineNet:multi-path refinement networks for high resolution semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:219-234.
[20] QI Xiaojuan,LIAO Renjie,JIA Jiaya,et al.3D graph neural networks for RGB-D semantic segmentation[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:826-844.
[21] WANG W Y,NEUMANN U.Depth-aware CNN for RGB-D segmentation[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:538-552.
[22] PARK S J,HONG K S,LEE S Y.RDFNet:RGB-D multi-level residual feature fusion for indoor semantic segmentation[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:1723-1738.
[23] LIN Di,HUANG Hui.Zig-Zag network for semantic segmentation of RGB-D images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(10):2642-2655.
[24] LIN G S,SHEN C H,REID I,et al.Efficient piece-wise training of deep structured models for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:961-975.
[25] KENDALL A,BADRINARAY V,CIPOLLA R.Bayesian SegNet:model uncertainty in deep convolutional encoder-decoder architectures for scene understanding[EB/OL].[2020-03-04].https://arxiv.org/abs/1511.02680.

选择文件类型/文献管理软件名称

选择包含的内容