作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于块浮点的二维卷积频域加速处理单元设计

  • 发布日期:2026-02-11

A Frequency-domain Processing Element for Accelerating 2D Convolution based on Block Floating Point

  • Published:2026-02-11

摘要: 块浮点(BFP)因其独特的数值表示方式,广泛应用于卷积神经网络的卷积计算。特别是,频域卷积通过将空间域卷积运算转化为频域复数乘法,能够显著降低计算复杂度,从而实现高效的神经网络部署。然而,现有研究主要集中于基于BFP的空间域卷积加速计算或频域中的定点卷积加速计算,尚未充分挖掘 BFP 数值格式与频域卷积相结合在推理时延改进与资源效率优化方面的潜力。本文提出一种基于BFP的频域处理单元,该单元利用现场可编程门阵列中的数字信号处理器资源的结构特性,结合BFP 数据格式的指数共享机制,实现多个复数乘法运算的打包执行,以提升整体计算性能。此外,本文提出一种面向BFP频域卷积的数据流映射方法,在频域卷积中最大化BFP指数部分与尾数部分的数据重用。在具有代表性的卷积神经网络模型基准测试中,对所提出的频域 BFP 加速设计进行系统评估。评估结果表明,与当前先进的基于 BFP 的空间域卷积加速基线方案相比,该方法在推理时延方面最高可实现 5.4倍的改进,在资源效率方面实现8.5倍的优化。

Abstract: Block floating point (BFP), with its distinctive data representation, has been extensively applied in convolution calculations for convolutional neural networks. In particular, frequency-domain convolution transforms spatial-domain convolution into complex multiplications in the frequency domain, significantly reducing computational complexity and enabling efficient neural network deployment. However, existing studies mainly focus on BFP-based convolution acceleration in the spatial domain or fixed-point acceleration in the frequency domain, leaving the potential of combining the BFP numerical format with frequency-domain convolution underexplored in terms of inference latency reduction and resource efficiency optimization. In this work, we propose a BFP-based frequency-domain processing unit that exploits the structural characteristics of digital signal processing blocks in field-programmable gate arrays. By leveraging the exponent-sharing mechanism of the BFP format, the proposed design enables packed execution of multiple complex multiplications, thereby improving overall computational performance. Furthermore, we introduce a dataflow mapping method tailored for BFP-based frequency-domain convolution, which maximizes the reuse of both exponent and mantissa components of BFP data during frequency-domain processing. We conduct a systematic evaluation of the proposed frequency-domain BFP acceleration design on representative convolutional neural network benchmarks. Experimental results demonstrate that, the proposed approach achieves up to 5.4× speedup in inference latency and 8.5× gain in resource efficiency, compared with state-of-the-art BFP-based spatial-domain convolution acceleration baselines.