基于图像重建的跨模态融合事件相机目标检测

doi:10.19678/j.issn.1000-3428.0253447

摘要/Abstract

摘要： 事件相机以异步事件流的形式记录场景中的亮度变化，具有低延时、高动态范围等优点。然而，由于仅感知亮度变化而非完整的视觉信息，静态纹理信息缺失，从而在一定程度上影响以事件相机成像作为输入的目标检测系统性能。为解决该问题，本文旨在充分挖掘重建图像特征的补充价值，提升基于事件的目标检测精度。本文提出一种稀疏性驱动的通道注意力模块，对重建图像特征进行初步筛选和增强，构建了一种以事件特征为主导、重建图像特征为调制信号的跨模态融合机制，利用空间自适应的归一化参数实现两种模态特征的有效融合。实验结果表明，与现有基于事件的目标检测方法相比，所提出的方法在Gen1和1 Mpx数据集上的mAP分别提升了1.3%和0.6%。通过引入图像重建特征并结合稀疏性驱动的通道注意力机制，本文实现了跨模态特征的高效融合，提升了事件相机目标检测系统的性能。该方法为事件视觉在复杂场景下的高精度感知提供了有效路径，具有实际的应用价值。

Abstract: Event cameras recorded brightness changes in the scene in the form of asynchronous event streams, featuring low latency and high dynamic range. However, since they only perceived brightness change rather than visual aspects in the scene, the lack of static texture information can negatively affect the performance of object detection systems using event streams as input. To address this issue, this paper aimed to exploit features extracted by image reconstruction networks as assistance to enhance the accuracy of event-based object de-tection. A sparsity-driven channel attention module was pro-posed to preliminarily filter and enhance the features that extracted by an image reconstruction network. A cross-modal fusion mechanism was constructed, in which event features were the primary modality and reconstructed image features serve as modulation signals. Spatially adaptive normalization parameters were employed to achieve effective fusion of the two modalities. Experimental results demonstrate that the proposed method outperforms existing event-based object detection approaches on the Gen1 and 1 Mpx datasets, achieving mAP improvements of 1.3% and 0.6%, respectively. By introducing reconstructed image features and combining them with event features using a sparsity-driven channel attention mechanism, this paper achieved efficient cross-modal feature fusion and enhances the performance of event camera-based object detection systems. The proposed method provided an effective way for high-precision perception of event-based vision in complex scenarios, and had practical application value.

钟钧健, 陈卫刚. 基于图像重建的跨模态融合事件相机目标检测[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253447.

Zhong Junjian, Chen Weigang. Event Camera Object Detection Based on Image Reconstruction and Cross-Modal Feature Fusion[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253447.

参考文献

[1] 张亚丽,田启川,唐超林.基于事件相机的目标检测算法研究[J].计算机工程与应用,2024,60(13):23-35.
[2] LI J, LI J, ZHU L, et al. Asynchronous Spatio-Temporal Memory Network for Continuous Event-Based Object De-tection[J]. IEEE Transactions on Image Processing, 2022, 31: 2975-2987.
[3] PENG Y, ZHANG Y, XIAO P, et al. Better and faster: Adaptive event conversion for event-based object detec-tion[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(2): 2056-2064.
[4] Perot E, De Tournemire P, Nitti D, et al. Learning to detect objects with a 1 megapixel event camera[J]. Advances in Neural Information Processing Systems, 2020, 33: 16639-16652.
[5] ZHANG J, YANG X, FU Y, et al. Object Tracking by Jointly Exploiting Frame and Event Domain[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada: IEEE, 2021: 13023-13032.
[6] ZHU A Z, YUAN L, CHANEY K, et al. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras[C]//Robotics: Science and Systems XIV. 2018.
[7] GEHRIG D, LOQUERCIO A, DERPANIS K, et al. End-to-End Learning of Representations for Asynchronous Event-Based Data[C]//2019 IEEE/CVF International Con-ference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 5632-5642.
[8] TOMY A, PAIGWAR A, MANN K S, et al. Fusing Event-based and RGB camera for Robust Object Detection in Adverse Conditions[C]//2022 International Conference on Robotics and Automation (ICRA). Philadelphia, PA, USA: IEEE, 2022: 933-939.
[9] JIANG B, LI Z, ASIF M S, et al. Event Transformer[EB/OL]. arXiv, 2024.
[10] DENG Y, CHEN H, LIU H, et al. A Voxel Graph CNN for Object Classification with Event Cameras[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA: IEEE, 2022: 1162-1171.
[11] PENG Y, LI H, ZHANG Y, et al. Scene adaptive sparse transformer for event-based object detection[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE, 2023: 13425-13434.
[12] WU Z, GEHRIG M, LYU Q, et al. LEOD: Label-efficient object detection for event cameras[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, 2024: 16892-16901.
[13] GEHRIG M, SCARAMUZZA D. Recurrent Vision Trans-formers for Object Detection with Event Cameras[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, BC, Canada: IEEE, 2023: 13884-13893.
[14] MITROKHIN A, FERMULLER C, PARAMESHWARA C, et al. Event-based Moving Object Detection and Track-ing[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE ,2018: 1-9.
[15] CHEN N F Y. Pseudo-Labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection Under Ego-Motion[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, UT, USA: IEEE, 2018: 757-75709.
[16] IACONO M, WEBER S, GLOVER A, et al. Towards Event-Driven Object Detection with Off-the-Shelf Deep Learning[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid: IEEE, 2018: 1-9.
[17] JIANG Z, XIA P, HUANG K, et al. Mixed Frame-/Event-Driven Fast Pedestrian Detection[C]//2019 International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada: IEEE, 2019: 8332-8338.
[18] PENG Y, ZHANG Y, XIONG Z, et al. GET: Group Event Transformer for Event-Based Vision[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE, 2023: 6015-6025.
[19] ZUBIĆ N, GEHRIG M, SCARAMUZZA D. State space models for event cameras[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, 2024.
[20] LI D, LI J, TIAN Y. SODFormer: Streaming Object De-tection with Transformer Using Events and Frames[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 14020-14037.
[21] LI J, DONG S, YU Z, et al. Event-Based Vision Enhanced: A Joint Detection Framework in Autonomous Driving[C]//2019 IEEE International Conference on Multimedia and Expo (ICME). Shanghai, China: IEEE, 2019: 1396-1401.
[22] TULYAKOV S, GEHRIG D, GEORGOULIS S, et al. Time Lens: Event-Based Video Frame Interpolation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA: IEEE, 2021: 16155-16164.
[23] Hou K, Kong D, Jiang J, et al. Fe-fusion-vpr: Atten-tion-based multi-scale network architecture for visual place recognition by fusing frames and events[J]. IEEE Robotics and Automation Letters, 2023, 8(6): 3526-3533.
[24] LI L, LINIGER A, MILLHAEUSLER M, et al. Ob-ject-centric cross-modal feature distillation for event-based object detection[C]//2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE, 2024: 456-465.
[25] FRADI H, PAPADAKIS P. Advancing object detection for autonomous vehicles via general purpose event-RGB fusion[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 244-255.
[26] DE TOURNEMIRE P, NITTI D, PEROT E, et al. A Large Scale Event-based Detection Dataset for Automotive[EB/OL]. arXiv, 2020. https://arxiv.org/pdf/2001.08499
[27] LIU M, QI N, SHI Y, et al. An Attention Fusion Network For Event-Based Vehicle Object Detection[C]//2021 IEEE International Conference on Image Processing (ICIP). An-chorage, AK, USA: IEEE, 2021: 3363-3367.
[28] LI H, WU X J, KITTLER J. MDLatLRR: A novel decom-position method for infrared and visible image fusion[J]. IEEE Transactions on Image Processing, 2020, 29: 4733-4746.
[29] TU Z, TALEBI H, ZHANG H, et al. MaxViT: Multi-axis vision transformer[C]//Computer Vision – ECCV 2022. Tel Aviv, Israel: Springer, 2022: 459-479.
[30] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016: 779-788.
[31] HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017: 2961-2969.
[32] TAN M, PANG R, LE Q V. EfficientDet: Scalable and Efficient Object Detection[C]//2020 IEEE/CVF Conference[1] 张亚丽,田启川,唐超林.基于事件相机的目标检测算法研究[J].计算机工程与应用,2024,60(13):23-35.
[33] ERCAN B, EKER O, SAGLAM C, et al. HyperE2VID: Improving Event-Based Video Reconstruction via Hyper-networks[J]. IEEE Transactions on Image Processing, 2024, 33: 1826-1837.
[34] CADENA P R G, QIAN Y, WANG C, et al. SPADE-E2VID: Spatially-Adaptive Denormalization for Event-Based Video Reconstruction[J]. IEEE Transactions on Image Processing, 2021, 30: 2488-2500.
[35] OUYANG D, HE S, ZHANG G, et al. Efficient Multi-Scale Attention Module with Cross-Spatial Learning[C]//ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece: IEEE, 2023: 1-5.
[36] PARK T, LIU M Y, WANG T C, et al. Semantic Image Synthesis with Spatially-Adaptive Normalization[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE, 2019: 2337-2346.
[37] LIU Z, HU H, LIN Y, et al. Swin transformer V2: Scaling up capacity and resolution[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA: IEEE, 2022: 11999-12009.
[38] VERMA A A, CHAKRAVARTHI B, VAGHELA A, et al. eTraM: Event-based traffic monitoring dataset[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, 2024: 22637-22646.
[39] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]// Computer Vision – ECCV 2014. Zurich, Switzerland: Springer, 2014: 740-755.
[40] ZHANG Z, ZHANG H, ZHAO L, et al. Nested hierarchical transformer: Towards accurate, data-efficient and interpretable visual understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2022, 36(3): 3417-3425.
[41] Fan Y, Zhang W, Liu C, et al. Sfod: Spiking fusion object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 17191-17200.
[42] Wang Z, Wang Z, Li H, et al. Eas-snn: End-to-end adaptive sampling and representation for event-based detection with recurrent spiking neural networks[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 310-328.
[43] Xu Q, Deng J, Shen J, et al. Hybrid Spiking Vision Trans-former for Object Detection with Event Camer-as[C]//Forty-second International Conference on Machine Learning. 2025.
[44] Liu W, Xiang S, Zhang T, et al. S4-KD: A single step spiking SiamFC++ for object tracking with knowledge distillation[J]. Neural Networks, 2025, 188: 107478.

选择文件类型/文献管理软件名称

选择包含的内容