Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (6): 349-359. doi: 10.19678/j.issn.1000-3428.0068539

• Development Research and Engineering Application • Previous Articles     Next Articles

Adaptive Spatial Transformation Method for Vehicle Detection Based on Roadside Cameras

HUA Jiabao*(), ZHANG Jingrui, ZHU Fumin, CHEN Lu   

  1. Logistics Engineering College, Shanghai Maritime University, Shanghai 201306, China
  • Received:2023-10-10 Online:2025-06-15 Published:2024-05-21
  • Contact: HUA Jiabao

基于路侧相机的自适应空间变换车辆检测方法

华家宝*(), 张京瑞, 朱福民, 陈璐   

  1. 上海海事大学物流工程学院, 上海 201306
  • 通讯作者: 华家宝
  • 基金资助:
    上海市科技计划项目(22dz1204100)

Abstract:

To address the challenges in vehicle detection accuracy and efficiency using roadside cameras, this study presents an innovative vehicle detection framework that synergizes Convolutional Neural Network (CNN) and the Transformer architecture. Given the intricacies of traffic scenarios, we devise an adaptive spatial Transformer and combine it with ResNet50 to form a robust backbone network capable of managing diverse vehicle orientations and scales. We further refine the Transformer's input using position encodings grounded on angles and distances to ensure optimal spatial information utilization. A channel-space attention mechanism is incorporated to enhance the global contextual understanding of the images. In the decoding phase, the autoregressive approach is eschewed, facilitating parallel decoding of multiple targets, and the target query embeddings are integrated for vehicle detection tasks. Empirical evaluations on the UA-DETRAC, IITM-hetra and a proprietary dataset yield mAP@0.5 scores of 96.42%, 87.82% and 98.64%, respectively, surpassing benchmarked models across various scales. Ablation experiments underscore the pivotal role of each component in achieving superior performance.

Key words: adaptive spatial transformation, Transformer, vehicle detection, channel-space attention mechanism, roadside camera

摘要:

为了提高基于路侧相机的车辆检测的准确性和效率, 研究了融合卷积神经网络(CNN)与Transformer模型的车辆检测模型。针对复杂的交通场景, 设计了自适应空间Transformer, 将其与ResNet50结合构建了可以应对车辆视角和尺度变换的主干网络。设计了基于角度和距离的位置编码, 优化Transformer模型输入, 使模型充分利用图像中的空间信息, 并采用了通道空间注意力模块, 以更好地捕获图像中的上下文信息。在解码器部分, 去除了自回归机制, 允许模型并行解码多个目标, 并引入了目标查询集嵌入, 使其更适应车辆检测任务。实验结果表明, 所提模型在UA-DETRAC、IITM-hetra和自采数据集上的mAP@0.5分别达到96.42%、87.82%和98.64%, 在所有尺寸上均超越了其他对比模型。消融实验进一步验证了各模块对性能的关键贡献。

关键词: 自适应空间变换, Transformer, 车辆检测, 通道空间注意力机制, 路侧相机