Video Interpolation Based on Compression and Refined Deep Voxel Flow Model

doi:10.19678/j.issn.1000-3428.0062586

Abstract

Abstract: Video interpolation refers to the synthesis of intermediate frames using the image information of adjacent frames in a video, which can be directly applied to slow motion video playback, high-frequency video synthesis, animation production, and other applications.The existing video interpolation model based on Deep Voxel Flow(DVF) has issues such as low accuracy and many parameters, which limit its deployment and application in mobile terminals.This study proposes a refinement of the DVF interpolation model based on compression.By pre-training the DVF model, the interpolation quality of the video can be improved and high-precision parameters can be determined.The number of convolution channels in the model is reduced using sparse compression technology to reduce the number of parameters and obtain the bold voxel flow.Furthermore, the input video frame, bold voxel flow, and rough intermediate frame are taken as input for the refined voxel flow network.On this basis, the fine intermediate frame is calculated by trilinear interpolation method to enhance the ability of the model to capture edge information and thereby improve the accuracy of the intermediate frame.The experimental test results obtained using Vimeo 90K and UCF101 datasets show that compared with the DVF, SepConv, CDFI, and other models, the proposed model has a higher peak signal-to-noise ratio and structural similarity 1.59 dB and 0.015, respectively.Thus, the proposed model effectively optimizes the video synthesis effect on the premise of ensuring a small increase in parameter volume.

Key words: video interpolation, pre-training model, parameter compression, Convolutional Neural Network (CNN), refined Deep Voxel Flow(DVF) model

摘要： 视频插值是利用视频相邻帧的图像信息合成中间帧，可直接应用于慢动作视频回放、高频视频合成、动画制作等领域。现有基于深度体素流的视频插值模型存在合成精度低、参数量大的问题，限制其在移动端的部署应用。提出一种压缩驱动的精化深度体素流插值模型。通过预训练深度体素流模型提高视频的插值质量并确定高精度参数，利用稀疏压缩技术裁剪卷积通道数，以减少参数量并得到粗体素流，同时将输入视频帧、粗体素流和粗中间帧作为精体素流网络的输入，获得精体素流。在此基础上，通过三线性插值方法计算得到精中间帧，以增强模型对边缘信息的捕获能力，从而提高中间帧质量。在Vimeo 90K和UCF101数据集上的实验结果表明，相比DVF、SepConv、CDFI等模型，该模型的峰值信噪比和结构相似性分别平均提高1.59 dB和0.015，在保证参数量增幅较小的前提下，能够有效优化视频合成效果。

关键词: 视频插值, 预训练模型, 参数压缩, 卷积神经网络, 精化深度体素流模型

CLC Number:

TP391

RU Niuniu, YU Jinwei, YANG Weihua, BIAN Wei. Video Interpolation Based on Compression and Refined Deep Voxel Flow Model[J]. Computer Engineering, 2022, 48(9): 248-253.

茹妞妞, 于晋伟, 杨卫华, 卞玮. 基于压缩与精化深度体素流模型的视频插值[J]. 计算机工程, 2022, 48(9): 248-253.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062586

http://www.ecice06.com/EN/Y2022/V48/I9/248

Figures/Tables 7

References

[1] MAHAJAN D, HUANG F C, MATUSIK W, et al.Moving gradients:a path-based method for plausible image interpolation[J].ACM Transactions on Graphics, 2009, 28(3):42.
[2] MEYER S, WANG O, ZIMMER H, et al.Phase-based frame interpolation for video[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1410-1418.
[3] LONG G C, KNEIP L, ALVAREZ J M, et al.Learning image matching by simply watching video[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:434-450.
[4] LIU Z W, YEH R A, TANG X O, et al.Video frame synthesis using deep voxel flow[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:4473-4481.
[5] NIKLAUS S, MAI L, LIU F.Video frame interpolation via adaptive convolution[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:2270-2279.
[6] NIKLAUS S, MAI L, LIU F.Video frame interpolation via adaptive separable convolution[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:261-270.
[7] LEE H, KIM T, CHUNG T Y, et al.AdaCoF:adaptive collaboration of flows for video frame interpolation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:5315-5324.
[8] JIANG H Z, SUN D Q, JAMPANI V, et al.Super SloMo:high quality estimation of multiple intermediate frames for video interpolation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:9000-9008.
[9] BAO W B, LAI W S, MA C, et al.Depth-aware video frame interpolation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3698-3707.
[10] CHEN T, DING T Y, JI B, et al.Orthant based proximal stochastic gradient method for l1-regularized optimization[C]//Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery.New York, USA:ACM Press, 2020:1-10.
[11] DING T Y, LIANG L M, ZHU Z H, et al.CDFI:compression-driven network design for frame interpolation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:7997-8007.
[12] BUCILUǍ C, CARUANA R, NICULESCU-MIZIL A.Model compression[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2006:531-545.
[13] CHEN T Y, JI B, SHI Y X, et al.Neural network compression via sparse optimization[EB/OL].[2021-08-27].https://arxiv.org/abs/2011.04868v2.
[14] CHEN T Y, WANG G Y, DING T Y, et al.A half-space stochastic projected gradient method for group sparsity regularization[EB/OL].[2021-08-27].https://arxiv.org/abs/2009.12078v2.
[15] CHEN W L, WILSON J T, TYREE S, et al.Compressing neural networks with the hashing trick[C]//Proceedings of the 32nd International Conference on Machine Learning.New York, USA:ACM Press, 2015:2285-2294.
[16] RONNEBERGER O.Invited talk:U-Net convolutional networks for biomedical image segmentation[C]//Proceedings of Medical Image Computing and Computer-assisted Intervention.Berlin, Germany:Springer, 2015:234-241.
[17] SUN D Q, YANG X D, LIU M Y, et al.PWC-Net:CNNs for optical flow using pyramid, warping, and cost volume[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8934-8943.
[18] NIKLAUS S, LIU F.Context-aware synthesis for video frame interpolation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:1701-1710.
[19] GUI S R, WANG C Y, CHEN Q H, et al.FeatureFlow:robust video interpolation via structure-to-texture generation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:14001-14010.
[20] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[21] SOOMRO K, ZAMIR A R, SHAN M.UCF101:a dataset of 101 human actions classes form videos in the wild[EB/OL].[2021-08-27].https://arxiv.org/pdf/1212.0402.pdf.
[22] XUE T F, CHEN B A, WU J J, et al.Video enhancement with task-oriented flow[J].International Journal of Computer Vision, 2019, 127(8):1106-1125.

Please choose a citation manager

Content to export