基于深度学习的双目立体匹配方法综述

doi:10.19678/j.issn.1000-3428.0064294

摘要/Abstract

摘要： 双目立体匹配是计算机视觉领域的经典问题，在自动驾驶、遥感、机器人感知等诸多任务中得到广泛应用。双目立体匹配的主要目标是寻找双目图像对中同名点的对应关系，并利用三角测量原理恢复图像深度信息。近年来，基于深度学习的立体匹配方法在匹配精度和匹配效率上均取得了远超传统方法的性能表现。将现有基于深度学习的立体匹配方法分为非端到端方法和端到端方法。基于深度学习的非端到端方法利用深度神经网络取代传统立体匹配方法中的某一步骤，根据被取代步骤的不同，该类方法被分为基于代价计算网络、基于代价聚合网络和基于视差优化网络的3类方法。基于深度学习的端到端方法根据代价体维度的不同可分为基于3D代价体和基于4D代价体的方法。从匹配精度、时间复杂度、应用场景等多个角度对非端到端和端到端方法中的代表性成果进行分析，并归纳各类方法的优点以及存在的局限性。在此基础上，总结基于深度学习的立体匹配方法当前面临的主要挑战并展望该领域未来的研究方向。

关键词: 计算机视觉, 深度学习, 双目图像, 立体匹配方法, 图像深度

Abstract: Binocular stereo matching is a classical problem in the field of computer vision and has been widely used in many tasks such as automated driving, remote sensing, and robot perception.The main goal of binocular stereo matching is to identify the corresponding relationship of same-named points in a binocular image pair and to recover image depth information based on the triangulation principle.In recent years, stereo-matching methods based on deep learning have achieved much better performance than traditional methods in terms of matching accuracy and efficiency.Existing stereo-matching methods based on deep learning are divided into non-end-to-end and end-to-end methods.The non-end-to-end methods based on deep learning use deep neural networks to replace steps in traditional stereo-matching methods.Based on these different steps, these methods can be divided into three types of networks:cost-based computing, cost-based aggregation, and disparity-based optimization.The end-to-end methods based on deep learning can be divided into 3D and 4D cost-volume-based methods according to different cost-volume dimensions.The representative methods of non- and end-to-end methods are analyzed in terms of matching accuracy, time complexity, and application scenarios, and the advantages and limitations of various methods are summarized.Accordingly, the main challenges of stereo-matching methods based on deep learning are summarized and future research directions in the field are prospected.

Key words: computer vision, deep learning, binocular images, stereo-matching method, image depth

中图分类号:

TP391

尹晨阳, 职恒辉, 李慧斌. 基于深度学习的双目立体匹配方法综述[J]. 计算机工程, 2022, 48(10): 1-12.

YIN Chenyang, ZHI Henghui, LI Huibin. Survey of Binocular Stereo-matching Methods Based on Deep Learning[J]. Computer Engineering, 2022, 48(10): 1-12.

https://www.ecice06.com/CN/Y2022/V48/I10/1

图/表 9

参考文献

[1] MENZE M, GEIGER A.Object scene flow for autonomous vehicles[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:3061-3070.
[2] SCHMID K, TOMIC T, RUESS F, et al.Stereo vision based indoor/outdoor navigation for flying robots[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2013:3955-3962.
[3] HELMER S, LOWE D.Using stereo for object recognition[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2010:3121-3127.
[4] SHEAN D E, ALEXANDROV O, MORATTO Z M, et al.An automated, open-source pipeline for mass production of Digital Elevation Models(DEMs) from very-high-resolution commercial stereo satellite imagery[J].ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 116:101-117.
[5] 童欣, 殷晨波, 杜雪雪, 等.基于改进SIFT立体匹配算法的双目三维重建研究[J].机械工程与自动化, 2020(5):4-6. TONG X, YIN C B, DU X X, et al.Research on binocular 3D reconstruction based on improved SIFT stereo matching algorithm[J].Mechanical Engineering & Automation, 2020(5):4-6.(in Chinese)
[6] TIPPETTS B, LEE D J, LILLYWHITE K, et al.Review of stereo vision algorithms and their suitability for resource-limited systems[J].Journal of Real-Time Image Processing, 2016, 11(1):5-25.
[7] SCHARSTEIN D, SZELISKI R, ZABIH R.A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[C]//Proceedings of IEEE Workshop on Stereo and Multi-Baseline Vision.Washington D.C., USA:IEEE Press, 2001:131-140.
[8] 周秀芝, 文贡坚, 王润生.自适应窗口快速立体匹配[J].计算机学报, 2006, 29(3):473-479. ZHOU X Z, WEN G J, WANG R S.Fast stereo matching using adaptive window[J].Chinese Journal of Computers, 2006, 29(3):473-479.(in Chinese)
[9] WANG Z F, ZHENG Z G.A region based stereo matching algorithm using cooperative optimization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2008:1-8.
[10] ZHANG K, LU J B, LAFRUIT G.Cross-based local stereo matching using orthogonal integral images[J].IEEE Transactions on Circuits and Systems for Video Technology, 2009, 19(7):1073-1079.
[11] 陈炎, 杨丽丽, 王振鹏.双目视觉的匹配算法综述[J].图学学报, 2020, 41(5):702-708. CHEN Y, YANG L L, WANG Z P.Literature survey on stereo vision matching algorithms[J].Journal of Graphics, 2020, 41(5):702-708.(in Chinese)
[12] HIRSCHMÜLLER H.Stereo processing by semiglobal matching and mutual information[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2):328-341.
[13] 赵晨园, 李文新, 张庆熙.一种改进的实时半全局立体匹配算法及硬件实现[J].计算机工程, 2021, 47(9):162-170. ZHAO C Y, LI W X, ZHANG Q X.An improved real-time semi-global stereo matching algorithm and its hardware implementation[J].Computer Engineering, 2021, 47(9):162-170.(in Chinese)
[14] ZABIH R, WOODFILL J.Non-parametric local transforms for computing visual correspondence[EB/OL].[2022-02-05].https://link.springer.com/content/pdf/10.1007/bfb0028345.pdf.
[15] HIRSCHMULLER H, SCHARSTEIN D.Evaluation of cost functions for stereo matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2007:1-8.
[16] ŽBONTAR J, LECUN Y.Stereo matching by training a convolutional neural network to compare image patches[EB/OL].[2022-02-05].https://arxiv.org/abs/1510.05970.
[17] HAN X F, LEUNG T, JIA Y Q, et al.MatchNet:unifying feature and metric learning for patch-based matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:3279-3286.
[18] CHEN Z Y, SUN X, WANG L, et al.A deep visual correspondence embedding model for stereo matching costs[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:972-980.
[19] LUO W J, SCHWING A G, URTASUN R.Efficient deep learning for stereo matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:5695-5703.
[20] PARK H, LEE K M.Look wider to match image patches with convolutional neural networks[J].IEEE Signal Processing Letters, 2017, 24(12):1788-1792.
[21] YE X Q, LI J M, WANG H, et al.Efficient stereo matching leveraging deep local and context information[J].IEEE Access, 2017, 5:18745-18755.
[22] SCHUSTER R, WASENMUELLER O, UNGER C, et al.SDC-stacked dilated convolution:a unified descriptor network for dense matching tasks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:2551-2560.
[23] ZHANG F H, WAH B W.Fundamental principles on learning new features for effective dense matching[J].IEEE Transactions on Image Processing, 2018, 27(2):822-836.
[24] BATSOS K, CAI C J, MORDOHAI P.CBMV:a coalesced bidirectional matching volume for disparity estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:2060-2069.
[25] SEKI A, POLLEFEYS M.SGM-Nets:semi-global matching with neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6640-6649.
[26] SEKI A, POLLEFEYS M.Patch based confidence prediction for dense disparity map[C]//Proceedings of British Machine Vision Conference.Washington D.C., USA:IEEE Press, 2016:105-120.
[27] POGGI M, MATTOCCIA S.Learning a general-purpose confidence measure based on O(1) features and a smarter aggregation strategy for semi global matching[C]//Proceedings of the 4th International Conference on 3D Vision.Washington D.C., USA:IEEE Press, 2016:509-518.
[28] SCHONBERGER J L, SINHA S N, POLLEFEYS M.Learning to fuse proposals from multiple scanline optimizations in semi-global matching[EB/OL].[2022-02-05].https://openaccess.thecvf.com/content_ECCV_2018/papers/Johannes_Schoenberger_Learning_to_Fuse_ECCV_2018_paper.pdf.
[29] JIE Z Q, WANG P F, LING Y G, et al.Left-right comparative recurrent model for stereo matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3838-3846.
[30] SHAKED A, WOLF L.Improved stereo matching with constant highway networks and reflective confidence learning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:6901-6910.
[31] GIDARIS S, KOMODAKIS N.Detect, replace, refine:deep structured prediction for pixel wise labeling[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:7187-7196.
[32] BATSOS K, MORDOHAI P.RecResNet:a recurrent residual CNN architecture for disparity map enhancement[C]//Proceedings of International Conference on 3D Vision.Washington D.C., USA:IEEE Press, 2018:238-247.
[33] MAYER N, ILG E, HÄUSSER P, et al.A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:4040-4048.
[34] RONNEBERGER O, FISCHER P, BROX T.U-Net:convolutional networks for biomedical image segmentation[EB/OL].[2022-02-05].https://arxiv.org/pdf/1505.04597.pdf.
[35] DOSOVITSKIY A, FISCHER P, ILG E, et al.FlowNet:learning optical flow with convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:2758-2766.
[36] SAIKIA T, MARRAKCHI Y, ZELA A, et al.AutoDispNet:improving disparity estimation with AutoML[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:1812-1823.
[37] ILG E, SAIKIA T, KEUPER M, et al.Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation[EB/OL].[2022-02-05].https://arxiv.org/pdf/1808.01838v2.pdf.
[38] ILG E, MAYER N, SAIKIA T, et al.FlowNet 2.0:evolution of optical flow estimation with deep networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:1647-1655.
[39] PANG J H, SUN W X, REN J S, et al.Cascade residual learning:a two-stage convolutional neural network for stereo matching[C]//Proceedings of IEEE International Conference on Computer Vision Workshops.Washington D.C., USA:IEEE Press, 2017:878-886.
[40] LIANG Z F, FENG Y L, GUO Y L, et al.Learning for disparity estimation through feature constancy[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:2811-2820.
[41] TONIONI A, TOSI F, POGGI M, et al.Real-time self-adaptive deep stereo[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:195-204.
[42] YIN Z C, DARRELL T, YU F.Hierarchical discrete distribution decomposition for match density estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:6037-6046.
[43] KNÖBELREITER P, REINBACHER C, SHEKHOVTSOV A, et al.End-to-end training of hybrid CNN-CRF models for stereo[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:1456-1465.
[44] SONG X, ZHAO X, FANG L J, et al.EdgeStereo:an effective multi-task learning network for stereo matching and edge detection[J].International Journal of Computer Vision, 2020, 128(4):910-930.
[45] SONG X, ZHAO X, HU H W, et al.EdgeStereo:a context integrated residual pyramid network for stereo matching[EB/OL].[2022-02-05].https://arxiv.org/pdf/1803.05196v3.pdf.
[46] YANG G R, ZHAO H S, SHI J P, et al.SegStereo:exploiting semantic information for disparity estimation[EB/OL].[2022-02-05].https://arxiv.org/pdf/1807.11699v1.pdf.
[47] ZHAN W J, OU X Q, YANG Y Y, et al.DSNet:joint learning for scene segmentation and disparity estimation[C]//Proceedings of International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2019:2946-2952.
[48] CHANG J R, CHEN Y S.Pyramid stereo matching network[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:5410-5418.
[49] CHENG X J, WANG P, YANG R G.Learning depth with convolutional spatial propagation network[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10):2361-2379.
[50] NIE G Y, CHENG M M, LIU Y, et al.Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3278-3286.
[51] YU L D, WANG Y C, WU Y W, et al.Deep stereo matching with explicit cost aggregation sub-architecture[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1):12-26.
[52] GUO X Y, YANG K, YANG W K, et al.Group-wise correlation stereo network[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3268-3277.
[53] ZHANG J M, SKINNER K A, VASUDEVAN R, et al.DispSegNet:leveraging semantics for end-to-end learning of disparity estimation from stereo imagery[J].IEEE Robotics and Automation Letters, 2019, 4(2):1162-1169.
[54] LU C H, UCHIYAMA H, THOMAS D, et al.Sparse cost volume for efficient stereo matching[J].Remote Sensing, 2018, 10(11):1844.
[55] TULYAKOV S, IVANOV A, FLEURET F.Practical Deep Stereo(PDS):toward applications-friendly deep stereo matching[EB/OL].[2022-02-05].https://arxiv.org/abs/1806.01677.
[56] WANG Y, LAI Z H, HUANG G, et al.Anytime stereo image depth estimation on mobile devices[C]//Proceedings of International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2019:5893-5900.
[57] KHAMIS S, FANELLO S, RHEMANN C, et al.StereoNet:guided hierarchical refinement for real-time edge-aware depth prediction[EB/OL].[2022-02-05].https://arxiv.org/pdf/1807.08865v1.pdf.
[58] ZHANG F H, PRISACARIU V, YANG R G, et al.GA-Net:guided aggregation net for end-to-end stereo matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:185-194.
[59] KENDALL A, MARTIROSYAN H, DASGUPTA S, et al.End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:66-75.
[60] GEIGER A, LENZ P, URTASUN R.Are we ready for autonomous driving?The KITTI vision benchmark suite[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2012:3354-3361.
[61] SCHARSTEIN D, HIRSCHMÜLLER H, KITAJIMA Y, et al.High-resolution stereo datasets with subpixel-accurate ground truth[EB/OL].[2022-02-05].http://www.cs.middlebury.edu/~schar/papers/datasets-gcpr2014.pdf.

选择文件类型/文献管理软件名称

选择包含的内容