作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于图像重建的跨模态融合事件相机目标检测

  • 出版日期:2026-03-24 发布日期:2026-03-24

Event Camera Object Detection Based on Image Reconstruction and Cross-Modal Feature Fusion

  • Online:2026-03-24 Published:2026-03-24

摘要: 事件相机以异步事件流的形式记录场景中的亮度变化,具有低延时、高动态范围等优点。然而,由于仅感知亮度变化而非完整的视觉信息,静态纹理信息缺失,从而在一定程度上影响以事件相机成像作为输入的目标检测系统性能。为解决该问题,本文旨在充分挖掘重建图像特征的补充价值,提升基于事件的目标检测精度。本文提出一种稀疏性驱动的通道注意力模块,对重建图像特征进行初步筛选和增强,构建了一种以事件特征为主导、重建图像特征为调制信号的跨模态融合机制,利用空间自适应的归一化参数实现两种模态特征的有效融合。实验结果表明,与现有基于事件的目标检测方法相比,所提出的方法在Gen1和1 Mpx数据集上的mAP分别提升了1.3%和0.6%。通过引入图像重建特征并结合稀疏性驱动的通道注意力机制,本文实现了跨模态特征的高效融合,提升了事件相机目标检测系统的性能。该方法为事件视觉在复杂场景下的高精度感知提供了有效路径,具有实际的应用价值。

Abstract: Event cameras recorded brightness changes in the scene in the form of asynchronous event streams, featuring low latency and high dynamic range. However, since they only perceived brightness change rather than visual aspects in the scene, the lack of static texture information can negatively affect the performance of object detection systems using event streams as input. To address this issue, this paper aimed to exploit features extracted by image reconstruction networks as assistance to enhance the accuracy of event-based object de-tection. A sparsity-driven channel attention module was pro-posed to preliminarily filter and enhance the features that extracted by an image reconstruction network. A cross-modal fusion mechanism was constructed, in which event features were the primary modality and reconstructed image features serve as modulation signals. Spatially adaptive normalization parameters were employed to achieve effective fusion of the two modalities. Experimental results demonstrate that the proposed method outperforms existing event-based object detection approaches on the Gen1 and 1 Mpx datasets, achieving mAP improvements of 1.3% and 0.6%, respectively. By introducing reconstructed image features and combining them with event features using a sparsity-driven channel attention mechanism, this paper achieved efficient cross-modal feature fusion and enhances the performance of event camera-based object detection systems. The proposed method provided an effective way for high-precision perception of event-based vision in complex scenarios, and had practical application value.