作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (11): 137-144. doi: 10.19678/j.issn.1000-3428.0062691

• 网络空间安全 • 上一篇    下一篇

基于改进型时间分段网络的视频异常检测

黄涛1, 邬开俊1, 王迪聪1,2, 白晨帅1, 陶小苗1   

  1. 1. 兰州交通大学 电子与信息工程学院, 兰州 730070;
    2. 天津大学 智能与计算学部, 天津 300350
  • 收稿日期:2021-09-14 修回日期:2022-03-17 发布日期:2022-07-18
  • 作者简介:黄涛(1996—),男,硕士研究生,主研方向为视频异常检测;邬开俊(通信作者),教授、博士、博士生导师;王迪聪,博士研究生;白晨帅,硕士研究生;陶小苗,讲师、博士研究生。
  • 基金资助:
    国家自然科学基金(61966022);甘肃省教育厅优秀研究生“创新之星”项目(2021CXZX-555)。

Video Anomaly Detection Based on Improved Time Segmentation Network

HUANG Tao1, WU Kaijun1, WANG Dicong1,2, BAI Chenshuai1, TAO Xiaomiao1   

  1. 1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China;
    2. College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
  • Received:2021-09-14 Revised:2022-03-17 Published:2022-07-18

摘要: 视频异常检测是计算机视觉领域的一个重要研究课题,广泛应用于道路监控、异常事件监测等方面。考虑到异常行为的外观、运动特征与正常行为存在明显差异,提出一种改进型时间分段网络,利用该网络学习视频中的外观和运动信息,从而对视频异常行为进行预测。为了提取更多的视频信息,将RGB图和RGB帧差图相融合作为输入,以提取RGB图中的外观信息并通过RGB帧差图获得更有效的运动特征。将卷积注意力机制模块加入到时间分段网络模型中,从空间和通道2个不同的维度学习注意力图,利用学习到的注意力权重区分异常和正常的视频片段,同时运用焦点损失函数降低大量简单负样本在训练过程中所占的权重,使得模型更专注于难分类的样本,从而解决视频异常检测中正负样本比例不平衡的问题。实验结果表明,改进型时间分段网络在UCF-Crime和CUHK Avenue数据集上的AUC值分别达到77.6%和83.3%,检测性能优于基准方法TSN(RGB流)以及ISTL、3D-ConvAE等方法。

关键词: 视频异常检测, 卷积注意力机制, RGB帧差图, 焦点损失函数, 时间分段网络

Abstract: Video anomaly detection is an important research topic in the field of computer vision, that is widely used in road monitoring and abnormal event monitoring.Considering the obvious differences between the appearance and motion characteristics of abnormal and normal behavior, an improved time segmentation network is proposed to learn the appearance and motion information in video so as to predict abnormal video behavior.An RGB image and RGB frame difference image are fused as inputs to extract the appearance information from the RGB image and obtain more effective motion features from the RGB frame difference image.The convolution attention mechanism module is added to the time segmentation network model to learn the attention graph from two different dimensions of space and channel, and the learned attention weight is used to better distinguish between abnormal and normal video clips.Focal Loss function reduces the weight of a large number of simple negative samples during the training process, allowing the model to focus on samples that are difficult to classify, thus resolving the problem of the imbalance between the proportion of positive and negative samples during video anomaly detection.The experimental results show that the Area Under Curve(AUC) values of the improved time segmentation network on UCF-Crime and CUHK Avenue datasets reach 77.6% and 83.3%, respectively, and the detection performance is better than the benchmark methods TSN(RGB stream), ISTL, 3D-ConvAE, and other methods.

Key words: video anomaly detection, convolutional attention mechanism, RGB frame difference image, Focal Loss(FL) function, time segmentation network

中图分类号: