作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (10): 169-175. doi: 10.19678/j.issn.1000-3428.0063505

• 网络空间安全 • 上一篇    下一篇

基于目标时空上下文融合的视频异常检测算法

古平, 邱嘉涛, 罗长江, 张志鹏   

  1. 重庆大学 计算机学院, 重庆 400044
  • 收稿日期:2021-12-13 修回日期:2022-02-13 发布日期:2022-03-21
  • 作者简介:古平(1976—),男,副教授,主研方向为数据挖掘、机器学习;邱嘉涛、罗长江、张志鹏,硕士研究生。
  • 基金资助:
    重庆市技术创新与应用发展专项重点项目(cstc2019jscx-gksbX0096)。

Video Anomaly Detection Algorithm Based on Object Spatio-Temporal Context Fusion

GU Ping, QIU Jiatao, LUO Changjiang, ZHANG Zhipeng   

  1. School of Computing Science, Chongqing University, Chongqing 400044, China
  • Received:2021-12-13 Revised:2022-02-13 Published:2022-03-21

摘要: 视频异常检测旨在发现视频中的异常事件,异常事件的主体多为人、车等目标,每个目标都具有丰富的时空上下文信息,而现有检测方法大多只关注时间上下文,较少考虑代表检测目标和周围目标之间关系的空间上下文。提出一种融合目标时空上下文的视频异常检测算法。采用特征金字塔网络提取视频帧中的目标以减少背景干扰,同时计算相邻两帧的光流图,通过时空双流网络分别对目标的RGB帧和光流图进行编码,得到目标的外观特征和运动特征。在此基础上,利用视频帧中的多个目标构建空间上下文,对目标外观和运动特征重新编码,并通过时空双流网络重构上述特征,以重构误差作为异常分数对外观异常和运动异常进行联合检测。实验结果表明,该算法在UCSD-ped2和Avenue数据集上帧级AUC分别达到98.5%和86.3%,在UCSD-ped2数据集上使用时空双流网络相对于只用时间流和空间流网络分别提升5.1和0.3个百分点,采用空间上下文编码后进一步提升1个百分点,验证了融合方法的有效性。

关键词: 视频异常检测, 双流网络, 空间上下文, 自编码器, MemAE模块

Abstract: The purpose of video anomaly detection is to identify abnormal events in videos.Abnormal events primarily involve people, vehicles, and other objects.Each object in video data contains abundant spatio-temporal context information.However, most existing detection methods only focus on the temporal context and disregard the spatial context, which represents the relationship between detection and surrounding objects in anomaly detection.Herein, a video anomaly detection algorithm fused with object spatio-temporal context is proposed.The object in the video frame is extracted through Feature Pyramid Network(FPN) to reduce the background interference.Meanwhile, the optical flow diagram of two adjacent frames is calculated, the RGB frame and optical flow diagram of the object are encoded through the two-stream network, and the appearance and motion characteristics of the object are obtained.Subsequently, multiple objects are used in the video frame to construct the spatial context, and the object appearance and motion features are recoded.Finally, the characteristics above are reconstructed through the two-stream network.The reconstruction error is used to represent the anomaly score, and the appearance and motion anomalies are jointly detected.Experimental results show that the proposed algorithm achieves 98.5% and 86.3% frame level AUCs on the UCSD-ped2 and Avenue datasets, respectively.On the UCSD-ped2 dataset, the frame level AUC of the spatio-temporal two-stream network improved by 5.1 and 0.3 percentage points, respectively, compared with the network using only time and spatial streams.After using spatial context coding, the frame level AUC is further improved by 1 percentage point, which verifies the effectiveness of the fusion method.

Key words: video anomaly detection, two-stream network, spatio context, AutoEncoder(AE), MemAE module

中图分类号: