Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (7): 277-283. doi: 10.19678/j.issn.1000-3428.0061913

• Graphics and Image Processing • Previous Articles     Next Articles

Dual-Process Short Video Classification Method Based on Deep Learning

ZHANG Aihan, LIU Xiang, SHI Yunyu, LIU Siqi   

  1. School of Electrical and Electronic Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Received:2021-06-15 Revised:2021-08-04 Online:2022-07-15 Published:2021-09-06

基于深度学习的双流程短视频分类方法

张瑷涵, 刘翔, 石蕴玉, 刘思齐   

  1. 上海工程技术大学 电子电气工程学院, 上海 201620
  • 作者简介:张瑷涵(1995—),女,硕士研究生,主研方向为计算机视觉;刘翔(通信作者),副教授、博士;石蕴玉,讲师、博士;刘思齐,本科生。
  • 基金资助:
    文化部科技创新项目(2015KJCXXM19)。

Abstract: As the smartphones and 5G networks have become increasingly popular, short videos have become the medium through which people to acquire knowledge in a short time.Inspired by the shortage of short video datasets in real-life scenarios and low accuracy of short video classification, this study proposes a dual-process short video classification method integrating the deep learning technology.In the main process, a A-VGG-3D network model is constructed.Then, a VGG network with an attention mechanism is used to extract features, while the optimized 3D Convolutional Neural Network(3DCNN) is used for short video classification, which can improve the continuity, balance, and robustness of short videos in the temporal dimension.In the auxiliary process, the frame difference method is used to conduct shot switching to extract several frames from the short videos.Then, multi-scale face detection is performed on the extracted frames by integrating the sliding window mechanism and cascade classifier, which can further improve the short video classification accuracy.The experimental results demonstrate that the precision and recall of this method for non-plot and non-interview short videos on the UCF101 dataset and a self-built short video dataset of life scenes are 98.9% and 98.6%, respectively.Compared with the short video classification method based on a C3D network, the classification accuracy of the proposed method on the UCF101 dataset is 9.7 percentage points higher, which signifies that the proposed method more universally accurate.

Key words: 3D Convolutional Neural Network(3DCNN), deep learning, VGG network, attention mechanism, short video classification

摘要: 随着智能手机和5G网络的普及,短视频已经成为人们碎片时间获取知识的主要途径。针对现实生活场景短视频数据集不足及分类精度较低等问题,提出融合深度学习技术的双流程短视频分类方法。在主流程中,构建A-VGG-3D网络模型,利用带有注意力机制的VGG网络提取特征,采用优化的3D卷积神经网络进行短视频分类,提升短视频在时间维度上的连续性、平衡性和鲁棒性。在辅助流程中,使用帧差法判断镜头切换抽取出短视频中的若干帧,通过滑动窗口机制与级联分类器融合的方式对其进行多尺度人脸检测,进一步提高短视频分类准确性。实验结果表明,该方法在UCF101数据集和自建的生活场景短视频数据集上对于非剧情类与非访谈类短视频的查准率和查全率最高达到98.9%和98.6%,并且相比基于C3D网络的短视频分类方法,在UCF101数据集上的分类准确率提升了9.7个百分点,具有更强的普适性。

关键词: 3D卷积神经网络, 深度学习, VGG网络, 注意力机制, 短视频分类

CLC Number: