Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (2): 65-77. doi: 10.19678/j.issn.1000-3428.0069079

• Research Hotspots and Reviews • Previous Articles     Next Articles

Efficient Video Object Detection Based on Partitioning

HUANG Shuyi*(), TAN Guang   

  1. School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 518107, Guangdong, China
  • Received:2023-12-22 Online:2025-02-15 Published:2024-05-20
  • Contact: HUANG Shuyi

基于分区的高效视频目标检测

黄舒怡*(), 谭光   

  1. 中山大学智能工程学院, 广东 深圳 518107
  • 通讯作者: 黄舒怡
  • 基金资助:
    国家自然科学基金面上项目(62372488)

Abstract:

To address the challenge of balancing accuracy requirements and computational costs of object detection using deep neural networks for video analysis tasks, existing methods predominantly use entire video frames as units for computational resource allocation. To minimize computational costs while ensuring accuracy, these methods allocate more resources to frames with high information values, whereas frames with low information values receive less or no resource allocation. However, this strategy overlooks the uneven distribution of objects of interest within each video frame. This can lead to unnecessary computational overhead when excessive resources are allocated to highly informative regions within a small portion of a full-frame image. To address this issue, an efficient video object detection method based on partitioning is proposed. After partitioning the video frames, the features of the objects in each partition are extracted and processed rapidly. A configuration mapping analyzer is employed to map the detection configurations for each partition, thus satisfying the accuracy requirements while minimizing detection costs. Partitions with the same detection configuration are then concatenated for detection, further reducing the overall detection costs. Finally, corrective measures are implemented to address edge object fragmentation issues caused by partitioning, which lead to a decline in detection accuracy. The experimental results demonstrate that under the premise of meeting the accuracy requirements, this method significantly reduces computational costs, achieving a maximum saving of 90.84%. In situations with similar costs, the method averages a 10.48% -23.13% improvement in accuracy. This approach achieves efficient video object detection and exhibits superior adaptability across diverse scenarios.

Key words: video analysis, deep neural networks, object detection, partitioning, configuration mapping analyzer, concatenation detection

摘要:

为解决视频分析任务中利用深度神经网络进行目标检测时精度需求和计算开销难以平衡的问题, 现有工作大多以全帧视频图像作为计算资源调度的单位, 为信息价值高的视频帧分配更多资源, 为信息价值低的视频帧少分配甚至不分配资源, 从而在保证精度的前提下尽可能节省计算开销。然而, 由于每帧视频图像中感兴趣目标分布不均, 算法为含有少量高信息价值区域的全帧图像分配过多资源, 耗费不必要的检测计算开销。针对这一问题, 提出基于分区的视频目标检测方法。将视频图像分区后, 对各个分区感兴趣目标的特征进行快速提取和处理, 通过配置映射分析器为每个分区映射满足精度需求且最小化检测开销的检测配置, 并将具有相同检测配置的分区拼接检测以进一步减少检测开销。最后, 针对分区引起的边缘目标碎片化导致检测精度下降的问题进行修复。实验结果表明: 该方法在满足检测精度需求的前提下, 显著降低了计算开销, 最大节省90.84%;在开销相近的情况下, 平均精度提升10.48%~23.13%, 实现了高效的视频目标检测, 且在静态和动态的不同场景下具有更好的适应性。

关键词: 视频分析, 深度神经网络, 目标检测, 分区, 配置映射分析器, 拼接检测