基于多流网络一致性的视频显著性检测

doi:10.19678/j.issn.1000-3428.0060268

摘要/Abstract

摘要： 现有的视频显著性检测算法通常采用双流结构提取视频的时空线索，其中运动信息作为双流结构的一个分支，在显著物体发生剧烈或慢速移动时存在运动估计准确率低的问题，并且不合理的训练数据或方案使得权重偏向单个分支结构。提出一种基于多流网络一致性的视频显著性检测算法MSNC。设计并使用一种新的三重网络结构提取预选目标区域的颜色信息、时序信息和先验特征，通过先验特征补偿运动流的缺陷，并提高运动线索的利用率。采用多流一致性融合模型优化三流分支，得到不同特征的最佳融合方案。同时通过循环训练策略平衡三重网络的权重，以避免网络过度拟合单流分支，从而有效地提高运动估计和定位的准确率。在Davis数据集上的实验结果表明，相比PCSA、SSAV、MGA等算法，该算法的鲁棒性更优，其maxF和S-Measure值分别达到0.893和0.912，MAE仅为0.021。

关键词: 视频显著性检测, 运动信息, 先验信息, 多流一致性融合, 通道注意力机制

Abstract: Existing video saliency detection algorithms usually use the dual-stream structure to extract spatio-temporal clues.The motion information in the dual-stream structure leads to low accuracy of motion estimation when significant objects move violently or slowly.Unreasonable training data or schemes, however, bias the weight towards a single branch structure.To solve this problem, this paper proposes a video saliency detection algorithm MSNC based on a multi-stream network.A new triple network structure is designed to extract the color information, timing information, and priori features of the preselected target area, whereby the defects in motion flow are compensated through priori features while improving the utilization of motion cues.The multi-stream consistency fusion model is used to optimize the three stream branches to obtain the best fusion scheme with different characteristics.The weight of the triple network is balanced by the cyclic training strategy to avoid overfitting the single stream branch of the network to effectively improve the accuracy of motion estimation and location.The experimental results on the Davis dataset show that compared with PCSA, SSAV, and MGA, the robustness of the algorithm is better, with maxF and S-Measure values reaching 0.893 and 0.912, respectively, with an MAE of only 0.021.

Key words: video saliency detection, motion information, prior information, multi-stream consistency fusion, channel attention mechanism

中图分类号:

TP391.41

宋佳, 陈程立诏. 基于多流网络一致性的视频显著性检测[J]. 计算机工程, 2022, 48(2): 215-223.

SONG Jia, CHEN Chenglizhao. Video Saliency Detection Based on Multi-Stream Network Consistency[J]. Computer Engineering, 2022, 48(2): 215-223.

https://www.ecice06.com/CN/Y2022/V48/I2/215

图/表 11

20220301124337

20220301124341

20220301124344

20220301124349

20220301124353

20220301124400

20220301124404

20220301124408

20220301124413

20220301124418

20220301124422

参考文献

[1] CHEN Y, ZOU W, TANG Y, et al.SCOM:spatiotemporal constrained optimization for salient object detection[J].IEEE Transactions on Image Processing, 2018, 27(7):3345-3357.
[2] FAN Q, LUO W, XIA Y, et al.Metrics and methods of video quality assessment:a brief review[J].Multimedia Tools and Applications, 2019, 78(22):31019-31033.
[3] CHEN C, LI S, QIN H, et al.Real-time and robust object tracking in video via low-rank coherency analysis in feature space[J].Pattern Recognition, 2015, 48(9):2885-2905.
[4] BELLOULATA K, BELALIA A, ZHU S.Object-based stereo video compression using fractals and shape-adaptive DCT[J].AEU-International Journal of Electronics and Communications, 2014, 68(7):687-697.
[5] TU Z, GUO Z, XIE W, et al.Fusing disparate object signatures for salient object detection in video[J].Pattern Recognition, 2017, 72:285-299.
[6] SIMONYAN K, ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing System.New York, USA:ACM Press, 2014:568-576.
[7] LI H, CHEN G, LI G, et al.Motion guided attention for video salient object detection[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:7274-7283.
[8] GRAVES A, MOHAMED A, HINTON G.Speech recognition with deep recurrent neural networks[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2013:6645-6649.
[9] SHI X, CHEN Z, WANG H, et al.Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge, USA:MIT Press, 2015:802-810.
[10] SUN D, YANG X, LIU M Y, et al.Pwc-Net:CNNs for optical flow using pyramid, warping, and cost volume[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8934-8943.
[11] BAK C, KOCAK A, ERDEM E, et al.Spatio-temporal saliency networks for dynamic saliency prediction[J].IEEE Transactions on Multimedia, 2017, 20(7):1688-1698.
[12] TU Z, LI H, ZHANG D, et al.Action-stage emphasized spatiotemporal vlad for video action recognition[J].IEEE Transactions on Image Processing, 2019, 28(6):2799-2812.
[13] CHEN C, LI S, QIN H, et al.Bilevel feature learning for video saliency detection[J].IEEE Transactions on Multimedia, 2018, 20(12):3324-3336.
[14] GUO F, WANG W, SHEN J, et al.Video saliency detection using object proposals[J].IEEE Transactions on Cybernetics, 2017, 48(11):3159-3170.
[15] WANG W, SHEN J, PORIKLI F.Saliency-aware geodesic video object segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:3395-3402.
[16] LE T N, SUGIMOTO A.Deeply supervised 3D recurrent FCN for salient object detection in videos[C]//Proceedings of the 28th British Machine Vision Conference.Norwich, UK:[s.n.], 2017:3-4.
[17] WANG W, SHEN J, SHAO L.Video salient object detection via fully convolutional networks[J].IEEE Transactions on Image Processing, 2017, 27(1):38-49.
[18] LI G, XIE Y, WEI T, et al.Flow guided recurrent neural encoder for video salient object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3243-3252.
[19] SONG H, WANG W, ZHAO S, et al.Pyramid dilated deeper convLSTM for video salient object detection[C]//Proceedings of European Conference on Computer Vision.New York, USA:ACM Press, 2018:715-731.
[20] CHEN L, PAPANDREOU G, SCHROFF F, et al.Rethinking atrous convolution for semantic image segmentation[EB/OL].[2020-11-10].https://arxiv.org/pdf/1706.05587.pdf.
[21] FAN D, WANG W, CHENG M, et al.Shifting more attention to video salient object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:8554-8564.
[22] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-11-10].https://arxiv.org/pdf/1409.1556.pdf.
[23] HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[24] RONNEBERGER O, FISCHER P, BROX T.U-Net:convolutional networks for biomedical image segmentation[EB/OL].[2020-11-14].https://arxiv.org/pdf/1505.04597.pdf.
[25] RANJAN A, BLACK M J.Optical flow estimation using a spatial pyramid network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:4161-4170.
[26] BAKER S, SCHARSTEIN D, LEWIS J P, et al.A database and evaluation methodology for optical flow[J].International Journal of Computer Vision, 2011, 92(1):1-31.
[27] 刘宇光, 陈耀武.基于运动谱残差的视频显著性检测算法[J].计算机工程, 2014, 40(12):247-250. LIU Y G, CHEN Y W.Video saliency detection algorithm based on motion spectral residual[J].Computer Engineering, 2014, 40(12):247-250.(in Chinese)
[28] 李策, 虎亚玲, 曹洁, 等.基于对数Gabor的超复数视觉显著性检测算法[J].计算机工程, 2012, 38(7):148-151. LI C, HU Y L, CAO J, et al.Hypercomplex visual saliency detection algorithm based on Log-Gabor[J].Computer Engineering, 2012, 38(7):148-151.(in Chinese)
[29] FAN D, CHENG M, LIU Y, et al.Structure-measure:a new way to evaluate foreground maps[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:4548-4557.
[30] ACHANTA R, HEMAMI S, ESTRADA F, et al.Frequency-tuned salient region detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2009:1597-1604.
[31] PERAZZI F, KRÄHENBÜHL P, PRITCH Y, et al.Saliency filters:contrast based filtering for salient region detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2012:733-740.
[32] PERAZZI F, PONT-TUSET J, MCWILLIAMS B, et al.A benchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:724-732.
[33] LI F, KIM T, HUMAYUN A, et al.Video segmentation by tracking many figure-ground segments[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2013:2192-2199.
[34] WANG W, SHEN J, SHAO L.Consistent video saliency using local gradient flow optimization and global refinement[J].IEEE Transactions on Image Processing, 2015, 24(11):4185-4196.
[35] LI J, XIA C, CHEN X.A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection[J].IEEE Transactions on Image Processing, 2017, 27(1):349-364.
[36] KINGMA D P, BA J.Adam:a method for stochastic optimization[EB/OL].[2020-11-13].https://arxiv.org/pdf/1412.6980.pdf.
[37] GU Y, WANG L, WANG Z, et al.Pyramid constrained self-attention network for fast video salient object detection[C]//Proceedings of AAAI Conference on Artificial Intelligence.[S.l.]:AAAI Press, 2020:10869-10876.
[38] CHEN C, WANG G, PENG C, et al.Improved robust video saliency detection based on long-term spatial-temporal information[J].IEEE Transactions on Image Processing, 2019, 29:1090-1100.
[39] WU Z, SU L, HUANG Q.Cascaded partial decoder for fast and accurate salient object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3907-3916.
[40] LI S, SEYBOLD B, VOROBYOV A, et al.Unsupervised video object segmentation with motion-based bilateral networks[C]//Proceedings of the European Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2018:207-223.

选择文件类型/文献管理软件名称

选择包含的内容