基于非对称空间金字塔池化的立体匹配网络

doi:10.19678/j.issn.1000-3428.0055428

摘要/Abstract

摘要： 卷积神经网络因具有强大的表征能力而被广泛用于图像处理算法，但其在处理过程中存在耗时和信息损失等不足。为此，提出一种基于非对称空间金字塔池化模型的卷积神经网络结构。设计非对称金字塔池化方法融入立体匹配网络，以获取更详细的图像特征信息。分别叠加卷积核为3×3和1×1的卷积层，用于融合多尺度信息和提升网络收敛速度，同时将网络结构由4层增加至7层，以提高匹配精度。在KITTI和Middlebury数据集上进行视差预测，实验结果表明，与基准网络相比，该网络结构可使收敛时间缩短约50.1%，匹配错误率从6.65%降低至4.78%，在立体匹配中获得更平滑的视差效果。

关键词: 卷积神经网络, 非对称空间金字塔池化, 多尺度融合, 信息损失, 立体匹配

Abstract: Convolutional Neural Network(CNN) is often used in image processing algorithms because of its excellent representation capabilities,but the process is time-consuming and often results in information loss.To address the problem,this paper proposes a CNN structure based on Asymmetric Spatial Pyramid Pooling(ASPP) model.An ASPP method is designed to be integrated with the stereo matching network to obtain more specific information about image features.Then convolutional layers with a 3×3 convolution kernel are superposed on those with a 1×1 convolutional kernel for multi-scale information fusion and improvement of network convergence speed.Also,the number of network layers is increased from four layers to seven layers to improve the matching accuracy.The parallax prediction is performed on the KITTI and Middlebury data sets.Experimental results show that,compared with the benchmark network,the proposed network structure shortens the convergence time by about 50.1% and reduces the matching error rate from 6.65% to 4.78%,achieving a smoother parallax effect in stereo matching.

Key words: Convolutional Neural Network(CNN), Asymmetric Spatial Pyramid Pooling(ASPP), multi-scale fusion, information loss, stereo matching

中图分类号:

TP18

王金鹤, 苏翠丽, 孟凡云, 车志龙, 谭浩, 张楠. 基于非对称空间金字塔池化的立体匹配网络[J]. 计算机工程, 2020, 46(7): 228-234,242.

WANG Jinhe, SU Cuili, MENG Fanyun, CHE Zhilong, TAN Hao, ZHANG Nan. Stereo Matching Network Based on Asymmetric Spatial Pyramid Pooling[J]. Computer Engineering, 2020, 46(7): 228-234,242.

http://www.ecice06.com/CN/Y2020/V46/I7/228

参考文献

[1] SCHARSTEIN D,SZELISKI R.A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J].International Journal of Computer Vision,2002,47(1/2/3):7-42.
[2] ŽBONTAR J,LECUN Y.Stereo matching by training a convolutional neural network to compare image patches[J].Journal of Machine Learning Research,2016,17(1):1-32.
[3] XIAO Jinsheng,TIAN Hong,ZOU Wentao,et al.Stereo matching based on convolutional neural network[J].Acta Optica Sinica,2018,38(8):179-185.(in Chinese) 肖进胜,田红,邹文涛,等.基于深度卷积神经网络的双目立体视觉匹配算法[J].光学学报,2018,38(8):179-185.
[4] ZAGORUYKO S,KOMODAKIS N.Learning to compare image patches via convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:4353-4361.
[5] LUO W,SCHWING A G,URTASUN R.Efficient deep learning for stereo matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:5695-5703.
[6] SHAKED A,WOLF L.Improved stereo matching with constant highway networks and reflective confidence learning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1-5.
[7] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[8] WANG Yufeng,WANG Hongwei,YU Guang,et al.Stereo matching algorithm based on three-dimensional convolutional neural network[J].Acta Optica Sinica,2019,39(11):1-8.(in Chinese) 王玉锋,王宏伟,于光,等.基于三维卷积神经网络的立体匹配算法[J].光学学报,2019,39(11):1-8.
[9] WANG An,WANG Fangrong,GUO Baicang,et al.Disparity map optimization based on edge detection[J].Computer Applications and Software,2019,36(7):236-241.(in Chinese) 王安,王芳荣,郭柏苍,等.基于边缘检测的视差图效果优化[J].计算机应用与软件,2019,36(7):236-241.
[10] GUNEY F,GEIGER A.Displets:resolving stereo ambiguities using object knowledge[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:1-5.
[11] GIDARIS S,KOMODAKIS N.Detect,replace,refine:deep structured prediction for pixel wise labeling[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1-5.
[12] SEKI A,POLLEFEYS M.SGM-nets:semi-global matching with neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1-5.
[13] MAYER N,ILG E,HÄUSSER P,et al.A large dataset to train convolutional networks for disparity,optical flow,and scene flow estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4040-4048.
[14] PANG J H,SUN W X,REN J S J,et al.Cascade residual learning:a two-stage convolutional neural network for stereo matching[C]//Proceedings of IEEE International Conference on Computer Vision-Workshop on Geometry Meets Deep Learning.Washington D.C.,USA:IEEE Press,2017:887-895.
[15] KENDALL A,MARTIROSYAN H,DASGUPTA S,et al.End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:66-75.
[16] YUN Weiguo,SHI Qiqi,WANG Min.Multi-feature fusion gesture recognition based on deep convolutional neural network[J].Chinese Journal of Liquid Crystals and Displays,2019,34(4):417-422.(in Chinese)贠卫国,史其琦,王民.基于深度卷积神经网络的多特征融合的手势识别[J].液晶与显示,2019,34(4):417-422.
[17] XI Lu,LU Jixiang,TU Ting.Stereo matching method based on multi-scale CNN[J].Computer Engineering and Design,2018,39(9):2918-2922.(in Chinese)习路,陆济湘,涂婷.基于多尺度卷积神经网络的立体匹配方法[J].计算机工程与设计,2018,39(9):2918-2922.
[18] CHANG J R,CHEN Y S.Pyramid stereo matching network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:1-6.
[19] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[20] CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[21] BRANDAO P,MAZOMENOS E,STOYANOV D.Widening Siamese architectures for stereo matching[EB/OL].[2019-05-25].https://arxiv.org/pdf/1711.00499.pdf.
[22] URTASUN R,LENZ P,GEIGER A.Are we ready for autonomous driving? the kitti vision benchmark suite[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2012:1-5.
[23] MENZE M,GEIGER A.Object scene flow for autonomous vehicles[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:1-5.
[24] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:1-5.
[25] KINGMA D P,BA J.Adam:a method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations.Banff,Canada:[s.n.],2015:1-15.

选择文件类型/文献管理软件名称

选择包含的内容