Efficient Video Object Detection Based on Partitioning

doi:10.19678/j.issn.1000-3428.0069079

Abstract

Abstract:

To address the challenge of balancing accuracy requirements and computational costs of object detection using deep neural networks for video analysis tasks, existing methods predominantly use entire video frames as units for computational resource allocation. To minimize computational costs while ensuring accuracy, these methods allocate more resources to frames with high information values, whereas frames with low information values receive less or no resource allocation. However, this strategy overlooks the uneven distribution of objects of interest within each video frame. This can lead to unnecessary computational overhead when excessive resources are allocated to highly informative regions within a small portion of a full-frame image. To address this issue, an efficient video object detection method based on partitioning is proposed. After partitioning the video frames, the features of the objects in each partition are extracted and processed rapidly. A configuration mapping analyzer is employed to map the detection configurations for each partition, thus satisfying the accuracy requirements while minimizing detection costs. Partitions with the same detection configuration are then concatenated for detection, further reducing the overall detection costs. Finally, corrective measures are implemented to address edge object fragmentation issues caused by partitioning, which lead to a decline in detection accuracy. The experimental results demonstrate that under the premise of meeting the accuracy requirements, this method significantly reduces computational costs, achieving a maximum saving of 90.84%. In situations with similar costs, the method averages a 10.48% -23.13% improvement in accuracy. This approach achieves efficient video object detection and exhibits superior adaptability across diverse scenarios.

Key words: video analysis, deep neural networks, object detection, partitioning, configuration mapping analyzer, concatenation detection

摘要：

为解决视频分析任务中利用深度神经网络进行目标检测时精度需求和计算开销难以平衡的问题, 现有工作大多以全帧视频图像作为计算资源调度的单位, 为信息价值高的视频帧分配更多资源, 为信息价值低的视频帧少分配甚至不分配资源, 从而在保证精度的前提下尽可能节省计算开销。然而, 由于每帧视频图像中感兴趣目标分布不均, 算法为含有少量高信息价值区域的全帧图像分配过多资源, 耗费不必要的检测计算开销。针对这一问题, 提出基于分区的视频目标检测方法。将视频图像分区后, 对各个分区感兴趣目标的特征进行快速提取和处理, 通过配置映射分析器为每个分区映射满足精度需求且最小化检测开销的检测配置, 并将具有相同检测配置的分区拼接检测以进一步减少检测开销。最后, 针对分区引起的边缘目标碎片化导致检测精度下降的问题进行修复。实验结果表明: 该方法在满足检测精度需求的前提下, 显著降低了计算开销, 最大节省90.84%;在开销相近的情况下, 平均精度提升10.48%~23.13%, 实现了高效的视频目标检测, 且在静态和动态的不同场景下具有更好的适应性。

关键词: 视频分析, 深度神经网络, 目标检测, 分区, 配置映射分析器, 拼接检测

HUANG Shuyi, TAN Guang. Efficient Video Object Detection Based on Partitioning[J]. Computer Engineering, 2025, 51(2): 65-77.

黄舒怡, 谭光. 基于分区的高效视频目标检测[J]. 计算机工程, 2025, 51(2): 65-77.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069079

https://www.ecice06.com/EN/Y2025/V51/I2/65

Figures/Tables 15

Fig.1 System diagram

Fig.2 Configuration mapping analyzer network model

Fig.3 Partition image concatenation

Fig.4 Cross-frame concatenation

Fig.5 Detection result restoration

Fig.6 Fragmentation objects reparation

Fig.7 Comparison of accuracy-cost of various methods on data4

Fig.8 Comparison of configuration mapping analyzer and exhaustive search

Fig.9 Fraction of frames with accurate result before and after fragmentation object reparation

References 28

1	AIT OUALLANE A , BAKALI A , BAHNASSE A , et al. Fusion of engineering insights and emerging trends: intelligent urban traffic management system. Information Fusion, 2022, 88, 218- 248. doi: 10.1016/j.inffus.2022.07.020
2	张小瑞, 陈旋, 孙伟, 等. 基于深度学习的车辆再识别研究进展. 计算机工程, 2020, 46 (11): 1- 11. doi: 10.19678/j.issn.1000-3428.0058107
	ZHANG X R , CHEN X , SUN W , et al. Progress of vehicle re-identification research based on deep learning. Computer Engineering, 2020, 46 (11): 1- 11. doi: 10.19678/j.issn.1000-3428.0058107
3	SHOKROLAH S M , MORRIS B T . Vision-based turning movement monitoring: count, speed [WT《Times New Roman》] & waiting time estimation. IEEE Intelligent Transportation Systems Magazine, 2016, 8 (1): 23- 34. doi: 10.1109/MITS.2015.2477474
4	DAN Z C, EINFALT M, LIENHART R. Refining joint locations for human pose tracking in sports videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 1-10.
5	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-10-20]. https://arxiv.org/abs/1804.02767.
6	BASTANI F, MADDEN S. OTIF: efficient tracker pre-processing over large video datasets[C]// Proceedings of the 2022 International Conference on Management of Data. New York, USA: ACM Press, 2022: 2091-2104.
7	HSIEH K, ANANTHANARAYANAN G, BODÍK P, et al. Focus: querying large video datasets with low latency and low cost[C]//Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation. Washington D. C., USA: IEEE Press, 2018: 269-286.
8	DU K T, PERVAIZ A, YUAN X, et al. Server-driven video streaming for deep learning inference[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication. New York, USA: ACM Press, 2020: 557-570.
9	KANG D, BAILIS P D, ZAHARIA M. BlazeIt: optimizing declarative aggregation and limit queries for neural network-based video analytics[EB/OL]. [2023-10-20]. https://arxiv.org/abs/1805.01046.
10	NIGADE V, WANG L, BAL H. Clownfish: edge and cloud symbiosis for video stream analytics[C]//Proceedings of the IEEE/ACM Symposium on Edge Computing. Washington D. C., USA: IEEE Press, 2020: 55-69.
11	CANEL C , KIM T , ZHOU G , et al. Scaling video analytics on constrained edge nodes. Proceedings of Machine Learning and Systems, 2019, 1, 406- 417.
12	KANG D, EMMONS J, ABUZAID F, et al. NoScope: optimizing neural network queries over video at scale[EB/OL]. [2023-10-20]. https://arxiv.org/abs/1703.02529.
13	LI Y Q, PADMANABHAN A, ZHAO P Z, et al. Reducto: on-camera filtering for resource-efficient real-time video analytics[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication. New York, USA: ACM Press, 2020: 359-376.
14	CHEN T Y H, RAVINDRANATH L, DENG S, et al. Glimpse: continuous, real-time object recognition on mobile devices[C]//Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. New York, USA: ACM Press, 2015: 155-168.
15	KANG D, GUIBAS J, BAILIS P D, et al. TASTI: semantic indexes for machine learning-based queries over unstructured data[C]//Proceedings of the 2022 International Conference on Management of Data. New York, USA: ACM Press, 2022: 1934-1947.
16	AGARWAL N, NETRAVALI R. Boggart: towards general-purpose acceleration of retrospective video analytics[C]//Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation. [S. l. ]: USENIX, 2023: 933-951.
17	XU M, XU T, LIU Y, et al. Video analytics with zero-streaming cameras[C]//Proceedings of 2021 USENIX Annual Technical Conference. [S. l. ]: USENIX, 2021: 459-472.
18	MOLL O, BASTANI F, MADDEN S, et al. ExSample: efficient searches on video repositories through adaptive sampling[C]//Proceedings of the IEEE 38th International Conference on Data Engineering. Washington D. C., USA: IEEE Press, 2022: 2956-2968.
19	HAN S, SHEN H C, PHILIPOSE M, et al. MCDNN: an approximation-based execution framework for deep stream processing under resource constraints[C]//Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. New York, USA: ACM Press, 2016: 123-136.
20	ZHANG H, ANANTHANARAYANAN G, BODIK P, et al. Live video analytics at scale with approximation and delay-tolerance[C]//Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation. [S. l. ]: USENIX, 2017: 377-392.
21	RAN X K, CHEN H, ZHU X D, et al. DeepDecision: a mobile deep learning framework for edge video analytics[C]//Proceedings of IEEE Conference on Computer Communications. Washington D. C., USA: IEEE Press, 2018: 1421-1429.
22	JIANG J C, ANANTHANARAYANAN G, BODIK P, et al. Chameleon: scalable adaptation of video analytics[C]//Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. New York, USA: ACM Press, 2018: 253-266.
23	CAO J, HADIDI R, ARULRAJ J, et al. THIA: accelerating video analytics using early inference and fine-grained query planning[EB/OL]. [2023-10-20]. https://arxiv.org/abs/2102.08481.
24	XU R, ZHANG C L, WANG P C, et al. ApproxDet: content and contention-aware approximate object detection for mobiles[C]//Proceedings of the 18th Conference on Embedded Networked Sensor Systems. New York, USA: ACM Press, 2020: 449-462.
25	HWANG J, KIM M, KIM D, et al. CoVA: exploiting compressed-domain analysis to accelerate video analytics[C]//Proceedings of 2022 USENIX Annual Technical Conference. [S. l. ]: USENIX, 2022: 707-722.
26	RŮŽIČKA V, FRANCHETTI F. Fast and accurate object detection in high resolution 4K and 8K video using GPUs[C]//Proceedings of IEEE High Performance Extreme Computing Conference. Washington D. C., USA: IEEE Press, 2018: 1-7.
27	SUTTON R S , BARTO A G . Reinforcement learning: an introduction. Cambridge, USA: MIT Press, 2018.
28	LOEWENHERZ F , BAHL V , WANG Y H . Video analytics towards vision zero. Institute of Transportation Engineers ITE Journal, 2017, 87 (3): 25- 28.

[1]	SUN Haomiao, LI Zongmin, XIAO Qian, SUN Wenjie, ZHANG Wenxin. AI-Curling: An On-Site Curling Analysis and Decision-Making Method [J]. Computer Engineering, 2025, 51(2): 102-110.
[2]	LI Haifeng, LIU Sensen, WANG Huaichao, LI Nansha, ZHANG Yifan. Airport Pavement Underground Disease Detection Algorithm Integrating Association Reasoning [J]. Computer Engineering, 2025, 51(2): 397-406.
[3]	HU Shenglong, CHEN Bin, ZHANG Kaihua, SONG Huihui. Co-Saliency Object Detection Enhanced by Scene Structure Knowledge [J]. Computer Engineering, 2025, 51(1): 31-41.
[4]	YANG Lisha, LI Maojun, HU Jianwen, WANG Dingxiang. Strip Steel Surface Defect Detection Algorithm Based on Improved YOLOv7-tiny [J]. Computer Engineering, 2025, 51(1): 208-215.
[5]	ZENG Yuqi, LIU Bo, ZHONG Baichang, ZHONG Jin. Student Classroom Behavior Detection Algorithm Based on Improved YOLOv8 in Smart Education [J]. Computer Engineering, 2024, 50(9): 344-355.
[6]	Xiangquan GUI, Shiqing LIU, Li LI, Qingsong QIN, Tangyan LI. Pedestrian Detection Algorithm for Scenic Spots Based on Improved YOLOv8 [J]. Computer Engineering, 2024, 50(7): 342-351.
[7]	SUN Bangyong, MA Ming, YU Tao. Multi-scale Camouflaged Object Detection Method Based on Regional Feature Enhancement [J]. Computer Engineering, 2024, 50(5): 209-219.
[8]	ZHOU Qinyuan, DENG Yueping, ZHANG Lei, ZHANG Chen, LU Rirong, HU Xianzhe. Dynamic Visual SLAM System Integrating Optical Flow and Multi-View Geometry [J]. Computer Engineering, 2024, 50(5): 250-259.
[9]	LIU Ruikang, LIU Weiming, DUAN Mengfei, XIE Wei, DAI Yuan. Metro Platform Foreign Object Detection Based on Dual-channel Transformer [J]. Computer Engineering, 2024, 50(4): 197-207.
[10]	FU Lidong, AI Xiaotong, DOU Zengfa. Key Node Recognition Method Based on Layer Partitioning and Node Features [J]. Computer Engineering, 2024, 50(12): 142-150.
[11]	LI Zhaojie, ZHU Hengliang, MAO Guojun, YANG Xin. Progressively Feature-Enhanced Weakly Supervised for Salient Object Detection [J]. Computer Engineering, 2024, 50(12): 233-244.
[12]	WANG Lin, ZHAO Li, WANG Wuwei. Air-to-Air Target Detection of Unmanned Aerial Vehicles Under High Dynamic Scenarios [J]. Computer Engineering, 2024, 50(12): 265-275.
[13]	REN Shuyu, WANG Xiaoding, LIN Hui. Review of Attention Mechanisms in Object Detection [J]. Computer Engineering, 2024, 50(12): 16-32.
[14]	LIU Hongwei, SHAO Dongheng, YANG Jian, WEI Xian, LI Ke, YOU Xiong. Multi-Level Rotational Equivariant Object Detection Network Based on BEV Fusion [J]. Computer Engineering, 2024, 50(11): 246-257.
[15]	YANG Yudi, GE Haibo, XIN Shiao, XUE Zihan, YUAN Hao. Lightweight Small-Object Detection for Remote Sensing Images Integrating Super-Resolution and Feature Enhancement [J]. Computer Engineering, 2024, 50(11): 284-296.

Please choose a citation manager

Content to export