基于空洞卷积与注意力模块的立体匹配算法

doi:10.19678/j.issn.1000-3428.0065628

摘要/Abstract

摘要：

基于卷积神经网络的立体匹配算法大多需要较大的感受野，但多数算法在扩大感受野的同时参数量也容易剧增，导致算法对训练数据的规模要求较高。提出一种基于空洞卷积和注意力模块的立体匹配算法，采用空洞卷积模块，将残差结构和空洞卷积相结合，以在较少参数量的情况下扩大网络的感受野。使用注意力模块，通过不同层次的卷积整合多层次的信息，增加所提取信息的完整性。采用空间金字塔池化模块，通过帯权的金字塔池化扩大模型的感受野，并赋予不同层次信息不同的重要性程度。实验结果表明，在相同数据集和训练次数的情况下，所提算法相对于DispNetC等其他算法具有较快的收敛速度，且结构简单，参数量较少，适用于小样本数据。

关键词: 立体匹配, 小样本数据, 空洞卷积, 注意力模块, 金字塔池化

Abstract:

Most of the stereo matching algorithms based on convolutional neural networks require a large receptive field. However, the number of parameters in most algorithms is easy to increase when the receptive field is enlarged, which leads to high requirements on the scale of training data. A stereo matching algorithm, based on atrous convolution and attention module, is proposed. An atrous convolution module is used to combine residual structure and atrous convolution to enlarge the receptive field of the network with fewer parameters. The attention module is used to integrate multiple levels of information via different levels of convolution to increase the integrity of the extracted information. The spatial pyramid pool module is used to enlarge the receptive field of the model through the pyramid pool with the right, and different levels of information have different importance. The experimental results show that the proposed algorithm has a faster convergence speed than DispNetC and other algorithms with the same data set and training times. Moreover, it has a simple structure, few parameters, and is suitable for small sample data.

Key words: stereo matching, small samples data, atrous convolution, attention module, pyramid pooling

刘志浩, 孟凡云, 王金鹤, 张楠. 基于空洞卷积与注意力模块的立体匹配算法[J]. 计算机工程, 2023, 49(8): 223-231.

Zhihao LIU, Fanyun MENG, Jinhe WANG, Nan ZHANG. Stereo Matching Algorithm Based on Atrous Convolution and Attention Module[J]. Computer Engineering, 2023, 49(8): 223-231.

https://www.ecice06.com/CN/Y2023/V49/I8/223

图/表 17

图1 本文立体匹配算法的架构

Fig.1 Architecture of stereo matching algorithm in this paper

图2 空洞卷积模块的结构

Fig.2 Structure of atrous convolution module

图3 膨胀因子相同时有无填充的对比

Fig.3 Comparison of the same expansion factor with or without filling

图4 特征提取模块的结构

Fig.4 Structure of feature extraction module

表1 FEM模块的参数

Table 1 Parameters of FEM module

层号	类型	输出	s	T
1	Conv	32	1	—
2	ACM	32	1	2
3和4	ACM	32	1	3
5	Conv	64	2	—
6	Conv	128	1	—

图5 FPA模块的结构

Fig.5 Structure of FPA module

图6 DIM模块的结构

Fig.6 Structure of DIM module

表2 DIM模块的参数

Table 2 Parameters of DIM module

层号	类型	输出	s	T
7~9	ACM	128	2	2
10	FPA	128	—	—
1~12	Conv	64	1	—
13~14	Conv	32	1	—

图7 DRM模块的结构

Fig.7 Structure of DRM module

表3 DRM模块的参数

Table 3 Parameters of DRM module

层号	类型	输出	s	T
15	Conv	32	1	—
16~18	ACM	32	1	2
19	SPPSA	32	—	—
20	ACM	32	1	2
21	Conv	1	1	—

表4 空洞卷积模块内部不同结构的性能对比

Table 4 Performance comparison of different structures within atrous convolution module %

空洞卷积模块的类型	填充	捷径连接	Epe	Ed1	R1	R3	R5
基础模型	—	—	2.078	12.184	32.479	12.544	8.287
prototype 1	×	×	1.630	9.456	27.133	9.905	6.627
prototype 2	√	×	1.839	10.049	27.431	10.396	7.133
prototype 3	√	√	1.549	8.882	26.170	9.396	6.195
prototype 4	×	√	1.556	8.882	26.105	9.396	6.179

表5 不同空洞卷积模块个数下的模型性能对比（T=2）

Table 5 Comparison of model performance under different number of atrous convolution module (T=2) %

参数L	Epe	Ed1	R1	R3	R5
1	1.853	10.926	29.445	11.305	7.623
2	1.765	10.438	29.234	10.828	7.164
3	1.674	9.704	27.690	10.086	6.723
4	1.688	10.097	26.966	10.542	7.098
5	1.699	9.948	28.126	10.369	6.970

表6 空洞卷积模块中膨胀因子和层数对模型性能的影响

Table 6 Influence of expansion factor and number of layers on model performance in atrous convolution module %

参数L	参数T	Epe	Ed1	R1	R3	R5
2	2, 3	1.567	9.011	26.104	9.476	6.290
2	3, 3	1.547	8.858	26.387	9.370	6.145
3	2, 2, 2	1.674	9.704	27.690	10.086	6.723
3	2, 3, 2	1.638	9.476	27.482	9.918	6.540
3	2, 3, 3	1.483	8.530	25.823	9.054	5.802
3	2, 3, 4	1.727	9.882	28.559	10.318	6.732
3	3, 3, 3	1.588	8.995	26.291	9.434	6.277

表7 FPA模块与SPPSA模块的性能对比

Table 7 Performance comparison between FPA module and SPPSA module %

编号	FPA	ASPP	Epe	Ed1	R3
1	×	√	1.713	9.210	9.590
2	√	×	1.735	9.965	10.354
3	√	√	1.483	8.530	9.054

表8 与其他算法的性能对比

Table 8 Performance comparison with other algorithms %

算法	Epe	Ed1	R1	R3	R5
DispNetC^[1]	2.295	15.293	46.158	15.906	9.187
PSMNet^[11]	7.634	54.645	80.235	54.203	35.645
MLTNet^[27]	6.155	46.643	79.353	46.744	29.732
MBFNet^[32]	6.658	48.921	78.427	49.005	33.784
本文算法	1.483	8.502	25.823	9.054	5.802

图8 不同算法的运行结果

Fig.8 Running results of different algorithms

图9 不同算法的可视化结果对比

Fig.9 Comparison of visualization results of different algorithms

参考文献 32

1	MAYER N, ILG E, HÄUSSER P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 4040-4048.
2	ŽBONTAR J, LECUN Y. Computing the stereo matching cost with a convolutional neural network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 1592-1599.
3	ZBONTAR J, LECUN Y. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 2016, 17 (1): 2287- 2318.
4	LUO W J, SCHWING A G, URTASUN R. Efficient deep learning for stereo matching[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 5695-5703.
5	PARK H, LEE K M. Look wider to match image patches with convolutional neural networks. IEEE Signal Processing Letters, 2017, 24 (12): 1788- 1792. doi: 10.1109/LSP.2016.2637355
6	CHEN Z Y, SUN X, WANG L, et al. A deep visual correspondence embedding model for stereo matching costs[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2016: 972-980.
7	GÜNEY F, GEIGER A. Displets: resolving stereo ambiguities using object knowledge[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 4165-4175.
8	XIAO J S, TIAN H, ZOU W T, et al. Stereo matching based on convolutional neural network. Acta Optica Sinica, 2018, 38 (8): 0815017. doi: 10.3788/AOS201838.0815017
9	KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 66-75.
10	赵倩. 基于3D卷积模块和视差分割的立体匹配方法. 电子测量技术, 2021, 44 (18): 72- 77. URL
	ZHAO Q. Research of stereo matching method based on 3D convolution module and parallax segmentation. Electronic Measurement Technology, 2021, 44 (18): 72- 77. URL
11	CHANG J R, CHEN Y S. Pyramid stereo matching network[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 5410-5418.
12	黄怡洁, 朱江平, 杨善敏. 基于注意力机制的立体匹配算法. 计算机应用与软件, 2022, 39 (7): 235-240, 309 URL
	HUANG Y J, ZHU J P, YANG S M. Stereo matching algorithm based on attention mechanism. Computer Applications and Software, 2022, 39 (7): 235-240, 309 URL
13	DU X, EL-KHAMY M, LEE J. AMNet: deep atrous multiscale stereo disparity estimation networks[EB/OL]. [2022-07-20]. https://arxiv.org/abs/1904.09099.
14	刘侍刚, 张同, 杨建功, 等. 递进式空洞残差深度双目立体匹配网络. 西安电子科技大学学报, 2022, 32 (5): 175- 180. URL
	LIU S G, ZHANG T, YANG J G, et al. Progressive dialtion residual network for deep binocular stereo matching. Journal of Xidian University, 2022, 32 (5): 175- 180. URL
15	LIANG Z F, GUO Y L, FENG Y L, et al. Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (1): 300- 315. doi: 10.1109/TPAMI.2019.2928550
16	YAO C T, JIA Y D, DI H J, et al. A decomposition model for stereo matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 6087-6096.
17	张锡英, 王厚博, 边继龙. 多成本融合的立体匹配网络. 计算机工程, 2022, 48 (2): 186- 193. URL
	ZHANG X Y, WANG H B, BIAN J L. Multi-cost fusion stereo matching network. Computer Engineering, 2022, 48 (2): 186- 193. URL
18	XU G W, CHENG J D, GUO P, et al. Attention concatenation volume for accurate and efficient stereo matching[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA. IEEE Press, 2022: 12971-12980.
19	CHENG X L, ZHONG Y R, HARANDI M, et al. Hierarchical neural architecture search for deep stereo matching[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 22158-22169.
20	LIU B Y, YU H M, LONG Y Q. Local similarity pattern and cost self-reassembling for deep stereo matching networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36 (2): 1647- 1655. doi: 10.1609/aaai.v36i2.20056
21	何培玉, 黄劲松. 联合语义代价体的立体匹配网络改进方法. 导航定位学, 2022, 31 (6): 157- 164.
	HE P Y, HUANG J S. An improved method of stereo matching network combined with semantic cost volume. Journal of Navigation and Positioning, 2022, 31 (6): 157- 164.
22	刘振国, 李钊, 宋滕滕, 等. 结合可变形卷积与双边网格的立体匹配网络. 计算机工程, 2022, 48 (12): 241-247, 254 URL
	LIU Z G, LI Z, SONG T T, et al. Stereo matching network combining deformable convolution and bilateral grid. Computer Engineering, 2022, 48 (12): 241-247, 254 URL
23	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
24	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. New York, USA: ACM Press, 2015: 448-456.
25	MAAS A L. Rectifier nonlinearities improve neural network acoustic models[EB/OL]. [2022-07-20]. https://www.semanticscholar.org/paper/Rectifier-Nonlinearities-Improve-Neural-Network-Maas/367f2c63a6f6a10b3b64b8729d601e69337ee3cc.
26	LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL]. [2022-07-20]. https://arxiv.org/abs/1805.10180.
27	王玉锋, 王宏伟, 刘宇, 等. 基于多任务学习的立体匹配算法. 激光与光电子学进展, 2021, 58 (4): 0415010. URL
	WANG Y F, WANG H W, LIU Y, et al. Stereo matching algorithm based on multi-task learning. Laser & Optoelectronics Progress, 2021, 58 (4): 0415010. URL
28	GIRSHICK R. Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2016: 1440-1448.
29	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving?The KITTI vision benchmark suite[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2012: 3354-3361.
30	MENZE M, GEIGER A. Object scene flow for autonomous vehicles[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3061-3070.
31	KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. [2022-07-20]. https://arxiv.org/abs/1412.6980.
32	王玉锋, 王宏伟, 刘宇, 等. 渐进细化的实时立体匹配算法. 光学学报, 2020, 40 (9): 0915002. URL
	WANG Y F, WANG H W, LIU Y, et al. Progressive thinning real-time stereo matching algorithm. Acta Optica Sinica, 2020, 40 (9): 0915002. URL

[1]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[2]	李仲, 冒睿瑞, 王晓龙, 王根一, 安国成. 基于改进PIDNet的水位线检测算法[J]. 计算机工程, 2024, 50(8): 102-112.
[3]	闵莉, 董冰洁, 安冬. 基于多注意力机制与跨特征融合的语义分割算法[J]. 计算机工程, 2024, 50(8): 282-289.
[4]	李致金, 汤佳辉, 闫金凤. 基于边缘计算的轻量化识别方法[J]. 计算机工程, 2024, 50(6): 287-295.
[5]	陈晓玉, 沈晨, 沈阅, 孔德明. 基于改进SwiftNet的堆场图像实时分割网络[J]. 计算机工程, 2024, 50(6): 296-303.
[6]	陈伟, 王晓龙, 张晏玮, 安国成, 江波. 基于改进YOLOv8的高速公路服务区车辆违停检测[J]. 计算机工程, 2024, 50(4): 11-19.
[7]	陈伟, 王晓龙, 张晏玮, 安国成, 江波. 基于改进YOLOv8的高速公路服务区车辆违停检测[J]. 计算机工程, 2024, 50(4): 11-19.
[8]	杜田田, 王晓龙, 何劲. 复杂光照条件下基于光流的水运航道流速检测算法[J]. 计算机工程, 2024, 50(4): 60-67.
[9]	胡帅, 李华玲, 郝德琛. 改进U-Net的多级边缘增强医学图像分割网络[J]. 计算机工程, 2024, 50(4): 286-293.
[10]	王正家, 胡飞飞, 张成娟, 雷卓, 何涛. 引入轻量级Transformer的自适应窗口立体匹配算法[J]. 计算机工程, 2024, 50(2): 256-265.
[11]	兰红, 王惠钊. 结合轻量化与多尺度融合的交通标志检测算法[J]. 计算机工程, 2024, 50(10): 381-392.
[12]	杨瑞君, 秦晋京, 程燕. 基于生成对抗网络的自然场景低照度增强模型[J]. 计算机工程, 2024, 50(1): 279-288.
[13]	孙龙, 张荣芬, 刘宇红, 饶庭漓. 监控视角下密集人群口罩佩戴检测算法[J]. 计算机工程, 2023, 49(9): 313-320.
[14]	杨长沛, 廖列法. 基于门控空洞卷积特征融合的中文命名实体识别[J]. 计算机工程, 2023, 49(8): 85-95.
[15]	李强龙, 周新文, 位梦恩, 甘阳洲. 基于条形池化和注意力机制的街道场景红外目标检测算法[J]. 计算机工程, 2023, 49(8): 310-320.

选择文件类型/文献管理软件名称

选择包含的内容