作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (1): 20-30. doi: 10.19678/j.issn.1000-3428.0069369

• 基于感知信息的图像处理 • 上一篇    下一篇

基于三元自注意力的视频快照压缩成像重建

周宇1, 谢威1, 邝得互2, 江健民1,*()   

  1. 1. 深圳大学计算机与软件学院, 广东 深圳 518000
    2. 岭南大学计算与决策科学系, 香港 999077
  • 收稿日期:2024-02-08 出版日期:2025-01-15 发布日期:2025-02-08
  • 通讯作者: 江健民
  • 基金资助:
    国家自然科学基金重点项目(62032015); 深圳市科创委基础研究面上项目(JCYJ20220810112354002)

Reconstruction of Video Snapshot Compressive Imaging Based on Triple Self-Attention

ZHOU Yu1, XIE Wei1, Kwong Tak Wu2, JIANG Jianmin1,*()   

  1. 1. College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518000, Guangdong, China
    2. Department of Computing and Decision Sciences, Lingnan University, Hong Kong 999077, China
  • Received:2024-02-08 Online:2025-01-15 Published:2025-02-08
  • Contact: JIANG Jianmin

摘要:

视频快照压缩成像(SCI)是一种基于计算的成像技术, 通过在时间域和空间域上的混合压缩来实现高效成像。在视频SCI中, 利用信号的稀疏性以及它在时间域和空间域中的相关性并采用合适的视频SCI算法, 有效地重建原始视频信号。虽然基于深度学习的重建算法在多数任务中取得了良好的效果, 但是还存在过高的模型复杂度和较慢的重建速度。为解决这些问题, 提出一个基于三元自注意力的视频快照压缩成像重建网络模型SCT-SCI, 利用多分支分组自注意力机制来利用时间域和空间域的相关性。SCT-SCI模型由一个特征提取模块、一个视频重建模块和多个三元自注意力模块SCT-Block组成。每个SCT-Block由一个窗口自注意力分支、一个通道自注意力分支和一个时序自注意力分支组成, 同时引入空间聚合模块SC-2DFusion和全局聚合模块SCT-3DFusion加强特征融合。实验结果显示, 在模拟视频数据集上, 该模型具有低复杂度的优势, 在保证接近的重建质量的前提下相比EfficientSCI模型节省了31.58%的重建时间, 提升了实时性能。

关键词: 快照压缩成像, 压缩感知, Transformer架构, 深度学习, 特征融合

Abstract:

Video Snapshot Compressive Imaging (SCI) is a computational imaging technique that achieves efficient imaging through hybrid compression in both temporal and spatial domains. In video SCI, the sparsity of the signal and its correlations in the temporal and spatial domains can be exploited to effectively reconstruct the original video signal using appropriate video snapshot SCI algorithms. Although recent deep learning-based reconstruction algorithms have achieved state-of-the-art results in many tasks, they still face challenges related to excessive model complexity and slow reconstruction speeds. To address these issues, this research proposes a reconstruction network model for SCI based on triple self-attention, called SCT-SCI. It employs a multibranch-grouped self-attention mechanism to leverage the correlation in the spatial and temporal domains. The SCT-SCI model comprises a feature extraction module, a video reconstruction module, and a triple self-attention module, called SCT-Block. Each SCT-Block comprises a window self-attention branch, a channel self-attention branch, and a temporal self-attention branch. Additionally, it introduces a spatial fusion module, called SC-2DFusion, and a global fusion module, called SCT-3DFusion, to enhance feature fusion. The experimental results show that on the simulated video dataset, the proposed model demonstrates an advantage in low complexity. It saves 31.58% of the reconstruction time compared to the EfficientSCI model, while maintaining a similar reconstruction quality, thus improving real-time performance.

Key words: Snapshot Compressive Imaging (SCI), compressive sensing, Transformer architecture, deep learning, feature fusion