Video Anomaly Detection Algorithm Based on Object Spatio-Temporal Context Fusion

doi:10.19678/j.issn.1000-3428.0063505

Abstract

Abstract: The purpose of video anomaly detection is to identify abnormal events in videos.Abnormal events primarily involve people, vehicles, and other objects.Each object in video data contains abundant spatio-temporal context information.However, most existing detection methods only focus on the temporal context and disregard the spatial context, which represents the relationship between detection and surrounding objects in anomaly detection.Herein, a video anomaly detection algorithm fused with object spatio-temporal context is proposed.The object in the video frame is extracted through Feature Pyramid Network(FPN) to reduce the background interference.Meanwhile, the optical flow diagram of two adjacent frames is calculated, the RGB frame and optical flow diagram of the object are encoded through the two-stream network, and the appearance and motion characteristics of the object are obtained.Subsequently, multiple objects are used in the video frame to construct the spatial context, and the object appearance and motion features are recoded.Finally, the characteristics above are reconstructed through the two-stream network.The reconstruction error is used to represent the anomaly score, and the appearance and motion anomalies are jointly detected.Experimental results show that the proposed algorithm achieves 98.5% and 86.3% frame level AUCs on the UCSD-ped2 and Avenue datasets, respectively.On the UCSD-ped2 dataset, the frame level AUC of the spatio-temporal two-stream network improved by 5.1 and 0.3 percentage points, respectively, compared with the network using only time and spatial streams.After using spatial context coding, the frame level AUC is further improved by 1 percentage point, which verifies the effectiveness of the fusion method.

Key words: video anomaly detection, two-stream network, spatio context, AutoEncoder(AE), MemAE module

摘要： 视频异常检测旨在发现视频中的异常事件，异常事件的主体多为人、车等目标，每个目标都具有丰富的时空上下文信息，而现有检测方法大多只关注时间上下文，较少考虑代表检测目标和周围目标之间关系的空间上下文。提出一种融合目标时空上下文的视频异常检测算法。采用特征金字塔网络提取视频帧中的目标以减少背景干扰，同时计算相邻两帧的光流图，通过时空双流网络分别对目标的RGB帧和光流图进行编码，得到目标的外观特征和运动特征。在此基础上，利用视频帧中的多个目标构建空间上下文，对目标外观和运动特征重新编码，并通过时空双流网络重构上述特征，以重构误差作为异常分数对外观异常和运动异常进行联合检测。实验结果表明，该算法在UCSD-ped2和Avenue数据集上帧级AUC分别达到98.5%和86.3%，在UCSD-ped2数据集上使用时空双流网络相对于只用时间流和空间流网络分别提升5.1和0.3个百分点，采用空间上下文编码后进一步提升1个百分点，验证了融合方法的有效性。

关键词: 视频异常检测, 双流网络, 空间上下文, 自编码器, MemAE模块

CLC Number:

TP309

GU Ping, QIU Jiatao, LUO Changjiang, ZHANG Zhipeng. Video Anomaly Detection Algorithm Based on Object Spatio-Temporal Context Fusion[J]. Computer Engineering, 2022, 48(10): 169-175.

古平, 邱嘉涛, 罗长江, 张志鹏. 基于目标时空上下文融合的视频异常检测算法[J]. 计算机工程, 2022, 48(10): 169-175.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0063505

http://www.ecice06.com/EN/Y2022/V48/I10/169

Figures/Tables 6

References

[1] 黄凯奇, 陈晓棠, 康运锋, 等.智能视频监控技术综述[J].计算机学报, 2015, 38(6):1093-1118. HUANG K Q, CHEN X T, KANG Y F, et al.Intelligent visual surveillance:a review[J].Chinese Journal of Computers, 2015, 38(6):1093-1118.(in Chinese)
[2] 王志国, 章毓晋.监控视频异常检测:综述[J].清华大学学报(自然科学版), 2020, 60(6):518-529. WANG Z G, ZHANG Y J.Anomaly detection in surveillance videos:a survey[J].Journal of Tsinghua University(Science and Technology), 2020, 60(6):518-529.(in Chinese)
[3] ADAM A, RIVLIN E, SHIMSHONI I, et al.Robust real-time unusual event detection using multiple fixed-location monitors[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(3):555-560.
[4] LI W X, MAHADEVAN V, VASCONCELOS N.Anomaly detection and localization in crowded scenes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1):18-32.
[5] LI T, CHANG H, WANG M, et al.Crowded scene analysis:a survey[J].IEEE Transactions on Circuits and Systems for Video Technology, 2015, 25(3):367-386.
[6] LEE D G, SUK H I, PARK S K, et al.Motion influence map for unusual human activity detection and localization in crowded scenes[J].IEEE Transactions on Circuits and Systems for Video Technology, 2015, 25(10):1612-1623.
[7] XIONG L, CHEN X, SCHNEIDER J.Direct robust matrix factorizatoin for anomaly detection[C]//Proceedings of the 11th International Conference on Data Mining.Washington D.C., USA:IEEE Press, 2011:844-853.
[8] 李娟, 张冰怡, 冯志勇, 等.基于隐马尔可夫模型的视频异常场景检测[J].计算机工程与科学, 2017, 39(7):1300-1308. LI J, ZHANG B Y, FENG Z Y, et al.Anomaly detection based on hidden Markov model in videos[J].Computer Engineering & Science, 2017, 39(7):1300-1308.(in Chinese)
[9] SIMONYAN K, ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[EB/OL].[2021-11-10].https://arxiv.org/abs/1406.2199.
[10] FAN Y X, WEN G J, LI D R, et al.Video anomaly detection and localization via Gaussian mixture fully convolutional variational autoencoder[J].Computer Vision and Image Understanding, 2020, 195:1-10.
[11] 王瑞鹏.基于3D卷积的人体行为识别技术研究[D].沈阳:沈阳理工大学, 2021. WANG R P.Research on human action recognition technology based on 3D convolution[D].Shenyang:Shenyang Ligong University, 2021.(in Chinese)
[12] NOGAS J, KHAN S S, MIHAILIDIS A.DeepFall:non-invasive fall detection with deep spatio-temporal convolutional autoencoders[J].Journal of Healthcare Informatics Research, 2020, 4(1):50-70.
[13] CHONG Y S, TAY Y H.Abnormal event detection in videos using spatiotemporal autoencoder[M]//CONG F Y, LEUNG A, WEI Q L.Advances in neural networks-ISNN 2017.Berlin, Germany:Springer, 2017:189-196.
[14] WANG T, QIAO M N, LIN Z W, et al.Generative neural networks for anomaly detection in crowded scenes[J].IEEE Transactions on Information Forensics and Security, 2019, 14(5):1390-1399.
[15] SABOKROU M, KHALOOEI M, FATHY M, et al.Adversarially learned one-class classifier for novelty detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:3379-3388.
[16] GONG D, LIU L Q, LE V, et al.Memorizing normality to detect anomaly:memory-augmented deep autoencoder for unsupervised anomaly detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:1705-1714.
[17] PARK H, NOH J, HAM B.Learning memory-guided normality for anomaly detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:14360-14369.
[18] IONESCU R T, KHAN F S, GEORGESCU M I, et al.Object-centric auto-encoders and dummy anomalies for abnormal event detection in video[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:7834-7843.
[19] LIN T Y, DOLLÁR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:936-944.
[20] SEVILLA-LARA L, LIAO Y Y, GUNEY F, et al.On the integration of optical flow and action recognition[EB/OL].[2021-11-10].https://arxiv.org/abs/1712.08416.
[21] CHAN A B, LIANG Z S J, VASCONCELOS N.Privacy preserving crowd monitoring:counting people without people models or tracking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press:2008:1-7.
[22] LU C W, SHI J P, JIA J Y.Abnormal event detection at 150 FPS in MATLAB[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2014:2720-2727.
[23] HASAN M, CHOI J, NEUMANN J, et al.Learning temporal regularity in video sequences[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:733-742.
[24] LUO W X, LIU W, GAO S H.Remembering history with convolutional LSTM for anomaly detection[C]//Proceedings of IEEE International Conference on Multimedia and Expo.Washington D.C., USA:IEEE Press, 2017:439-444.
[25] NGUYEN T N, MEUNIER J.Anomaly detection in video sequence with appearance-motion correspondence[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:1273-1283.

Please choose a citation manager

Content to export