[1] ALI S B, FILIP S I, SENTIEYS O. A stochastic rounding-enabled low-precision floating-point mac for dnn training[C]//2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2024: 1-6.
[2] WONG Y, DONG Z, ZHANG W. Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic[C]//2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2021: 218-223.
[3] NI C, LU J, LIN J, et al. LBFP: Logarithmic block floating point arithmetic for deep neural networks[C], IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 2020: 201-204.
[4] 彭允, 王玉冰, 梁磊, 等. Winograd异构采样窗口卷积加速算子[J]. 计算机工程, 2025, 51(9): 71-79.
PENG YUN, WANG YUBING, LIANG LEI, et al. Winograd Heterogeneous Sampling Window Convolution Acceleration Operator[J]. Computer Engineering, 2025, 51(9): 71-79.
[5] ZOU L, ZHAO W, YIN S, et al. BiE: bi-exponent block floating-point for large language models quantization[C]//Forty-first International Conference on Machine Learning. 2024.
[6] HAN X, CHENG Y, WANG J, et al. Bbal: A bidirectional block floating point-based quantisation accelerator for large language models[C]//2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2025: 1-7.
[7] SONG Z, LIU Z, WANG D. Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).
[8] LEE S, CHOI J, NOH S, et al. DBPS: dynamic block size and precision scaling for efficient DNN training supported by RISC-V ISA extensions[C]//2023 60th ACM/IEEE Design Automation Conference (DAC).IEEE,2023:1-6.
[9] NASCIMENTO M G, PRISACARIU V A, FAWCETT R, et al. Hyperblock floating point: Generalised quantization scheme for gradient and inference computation[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023: 6364-6373.
[10] LO Y C, LIU R S. Bucket getter: A bucket-based processing engine for low-bit block floating point (bfp) dnns[C]//Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 2023: 1002-1015.
[11] LU L, LIANG Y, XIAO Q, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs[C]//2017 IEEE 25th a nnual international symposium on field-programmable custom computing machines (FCCM). IEEE, 2017: 101-108.
[12] AHMAD A, PASHA M A. FFConv: an FPGA-based accelerator for fast convolution layers in convolutional neural networks[J]. ACM Transactions on Embedded Computing Systems (TECS), 2020, 19(2): 1-24.
[13] ELEFTHERIADIS C, KARAKONSTANTIS G. Energy-efficient fast Fourier transform for real-valued applications[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2022, 69(5): 2458-2462.
[14] ZHANG F, GAO Z, HUANG J, et al. HFOD: A hardware-friendly quantization method for object detection on embedded FPGAs[J]. IEICE Electronics Express, 2022, 19(8): 20220067-20220067.
[15] FRASSER C F, LINARES-SERRANO P, DE LOS RÍOS I D, et al. Fully parallel stochastic computing hardware implementation of convolutional neural networks for edge computing applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34(12): 10408-10418.
[16] 关明晓, 刘嘉堃, 张鸿锐, 等. 基于FPGA误差可控的浮点运算加速器研究[J]. 计算机工程, 2024, 50(5): 291-297.
GUAN MINGXIAO, LIU JIAKUN, ZHANG HONGRUI, et al. Study of FPGA-based Error-controllable Floating-point Operation Accelerators[J]. Computer Engineering, 2024, 50(5): 291-297.
[17] LEE J, LEE W, SIM J. Tender: Accelerating large language models via tensor decomposition and runtime requantization[C]//2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE, 2024: 1048-1062.
[18] NOH S H, KOO J, LEE S, et al. FlexBlock: A flexible DNN training accelerator with multi-mode block floating point support[J]. IEEE Transactions on Computers, 2023, 72(9): 2522-2535.
[19] ZHAO W, DANG Q, XIA T, et al. Optimizing FPGA-Based DNN accelerator with shared exponential floating-point format[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(11): 4478-4491.
[20] Fang C, Shi M, Geens R, et al. Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format[C]//2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2025: 1467-1481.
[21] ZHANG S Q, MCDANEL B, KUNG H T. Fast: Dnn training under variable precision block floating point with stochastic rounding[C]//2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022: 846-860.
[22] LANGHAMMER M, GRIBOK S, BAECKLER G. High density 8-bit multiplier systolic arrays for FPGA[C]//2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2020: 84-92.
[23] JANG J, KIM Y, LEE J, et al. Figna: Integer unit-based accelerator design for fp-int gemm preserving numerical accuracy[C]//2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2024: 760-773.
[24] LIU S, FAN H, LUK W. Accelerating fully spectral CNNs with adaptive activation functions on FPGA[C]//2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021: 1530-1535.
[25] WANG X, ZHOU Z, YUAN Z, et al. FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications[J]. ACM Transactions on Embedded Computing Systems, 2023, 22(6): 1-30.
[26] YANG J, YUNE S, LIM S, et al. ACane: An Efficient FPGA-based Embedded Vision Platform with Accumulation-as-Convolution Packing for Autonomous Mobile Robots[C]//2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2024: 533-538.
[27] DOON R, RAWAT T K, GAUTAM S. Cifar-10 classification using deep convolutional neural network[C]//2018 IEEE punecon. IEEE, 2018: 1-5.
[28] DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 248-255.
[29] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[30] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[31] PRASAD B M P, PARANE K, TALAWAR B. High-performance NoC simulation acceleration framework employing the xilinx DSP48E1 blocks[C]//2019 International Symposium on VLSI Design, Automation and Test (VLSI-DAT). IEEE, 2019: 1-4.
|