Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Lightweight ViT with Orientation and Frequency Awareness

  

  • Published:2026-03-11

融合方位与频域感知的轻量级ViT

Abstract: Addressing the limitations of existing lightweight Vision Transformers (ViTs), specifically the lack of explicit structural and spectral priors during token construction which leads to the loss of local high-frequency details and constrained representation efficiency, this paper proposes a novel framework named OFT-Former (Orientation- and Frequency-Aware Token Interaction Transformer). First, an Orientation-Aware Patch Embedding (OAPE) module is designed to explicitly inject horizontal and vertical spatial structural priors during initialization, thereby mitigating the insufficient geometric perception inherent in traditional embedding methods. Second, a Frequency-Enhanced Token Refinement (FETR) module is proposed, which leverages Fast Fourier Transform (FFT) to decouple frequency-domain features and integrates multi-scale convolutions to specifically enhance the preservation of high-frequency details. Furthermore, a Bidirectional Gated Token Modulation (BGTM) mechanism is constructed to establish bidirectional interaction pathways between local and global features, facilitating adaptive fusion of cross-scale representations via dynamic gating. Experimental results demonstrate that OFT-Former achieves a Top-1 accuracy of 81.4% on ImageNet-1K with only 12.8M parameters and 1.8 GFLOPs. Additionally, the model exhibits superior performance on CIFAR-100 classification and COCO object detection tasks, verifying the effectiveness of the proposed method.

摘要: 针对现有轻量级视觉Transformer在词元构建阶段缺乏显式结构先验与频域先验,导致局部高频细节丢失及表征效率受限的问题,本文提出一种融合方位与频域感知的轻量级模型,称为OFT-Former。首先,设计方位感知块嵌入模块,在词元构建阶段显式引入水平与垂直方向的空间结构先验,有效弥补传统块嵌入在几何信息捕捉方面的不足。其次,构建频域增强词元表征细化模块,利用快速傅里叶变换实现频域特征解耦,并结合多尺度卷积针对性强化高频细节保留。进一步,提出双向门控词元调制机制,建立局部与全局特征间的双向交互通路,通过动态门控实现跨尺度特征的自适应融合。实验结果表明,OFT-Former在ImageNet-1K上以12.8M的参数量和1.8 GFLOPs的计算开销取得了81.4%的Top-1准确率,在CIFAR-100分类与COCO目标检测与实例分割任务中亦表现优异,充分验证了模型的有效性。