Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (9): 248-253. doi: 10.19678/j.issn.1000-3428.0062586

• Graphics and Image Processing • Previous Articles     Next Articles

Video Interpolation Based on Compression and Refined Deep Voxel Flow Model

RU Niuniu, YU Jinwei, YANG Weihua, BIAN Wei   

  1. College of Mathematics, Taiyuan University of Technology, Taiyuan 030000, China
  • Received:2021-09-07 Revised:2021-10-19 Published:2021-10-25

基于压缩与精化深度体素流模型的视频插值

茹妞妞, 于晋伟, 杨卫华, 卞玮   

  1. 太原理工大学 数学学院, 太原 030000
  • 作者简介:茹妞妞(1996—),女,硕士研究生,主研方向为图像处理;于晋伟,讲师、博士;杨卫华(通信作者),教授、博士;卞玮,硕士研究生。
  • 基金资助:
    国家自然科学基金(11671296)。

Abstract: Video interpolation refers to the synthesis of intermediate frames using the image information of adjacent frames in a video, which can be directly applied to slow motion video playback, high-frequency video synthesis, animation production, and other applications.The existing video interpolation model based on Deep Voxel Flow(DVF) has issues such as low accuracy and many parameters, which limit its deployment and application in mobile terminals.This study proposes a refinement of the DVF interpolation model based on compression.By pre-training the DVF model, the interpolation quality of the video can be improved and high-precision parameters can be determined.The number of convolution channels in the model is reduced using sparse compression technology to reduce the number of parameters and obtain the bold voxel flow.Furthermore, the input video frame, bold voxel flow, and rough intermediate frame are taken as input for the refined voxel flow network.On this basis, the fine intermediate frame is calculated by trilinear interpolation method to enhance the ability of the model to capture edge information and thereby improve the accuracy of the intermediate frame.The experimental test results obtained using Vimeo 90K and UCF101 datasets show that compared with the DVF, SepConv, CDFI, and other models, the proposed model has a higher peak signal-to-noise ratio and structural similarity 1.59 dB and 0.015, respectively.Thus, the proposed model effectively optimizes the video synthesis effect on the premise of ensuring a small increase in parameter volume.

Key words: video interpolation, pre-training model, parameter compression, Convolutional Neural Network (CNN), refined Deep Voxel Flow(DVF) model

摘要: 视频插值是利用视频相邻帧的图像信息合成中间帧,可直接应用于慢动作视频回放、高频视频合成、动画制作等领域。现有基于深度体素流的视频插值模型存在合成精度低、参数量大的问题,限制其在移动端的部署应用。提出一种压缩驱动的精化深度体素流插值模型。通过预训练深度体素流模型提高视频的插值质量并确定高精度参数,利用稀疏压缩技术裁剪卷积通道数,以减少参数量并得到粗体素流,同时将输入视频帧、粗体素流和粗中间帧作为精体素流网络的输入,获得精体素流。在此基础上,通过三线性插值方法计算得到精中间帧,以增强模型对边缘信息的捕获能力,从而提高中间帧质量。在Vimeo 90K和UCF101数据集上的实验结果表明,相比DVF、SepConv、CDFI等模型,该模型的峰值信噪比和结构相似性分别平均提高1.59 dB和0.015,在保证参数量增幅较小的前提下,能够有效优化视频合成效果。

关键词: 视频插值, 预训练模型, 参数压缩, 卷积神经网络, 精化深度体素流模型

CLC Number: