[1] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]//9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net, 2021.
[2] LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE, 2021: 9992-10002.
[3] HAN K, WANG Y, CHEN H, et al. A survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 87-110.
[4] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 6000-6010.
[5] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates, Inc., 2020: 1877-1901.
[6] SHAZEER N, MIRHOSEINI A, MAZIARZ K, et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer[C]//5th International Conference on Learning Representations. Toulon, France: OpenReview.net, 2017.
[7] FEDUS W, ZOPH B, SHAZEER N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity[J]. Journal of Machine Learning Research, 2022, 23(120): 1-39.
[8] LEPIKHIN D, LEE H, XU Y, et al. GShard: Scaling giant models with conditional computation and automatic sharding[C]//9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net, 2021.
[9] DAI D, DENG C, ZHAO C, et al. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand: Association for Computational Linguistics, 2024: 1280-1297.
[10] 史宏志, 赵健, 赵雅倩, 等. 大模型时代的混合专家系统优化综述[J]. 计算机研究与发展, 2025, 62(5): 1164-1189.
SHI H Z, ZHAO J, ZHAO Y Q, et al. Survey on mixture of experts system optimization in the era of large models[J]. Journal of Computer Research and Development, 2025, 62(5): 1164-1189. (in Chinese)
[11] NECHI A, GROTH L, MULHEM S, et al. FPGA-based deep learning inference accelerators: Where are we standing?[J]. ACM Transactions on Reconfigurable Technology and Systems, 2023, 16(4): 1-32.
[12] VENIERIS S I, BOUGANIS C S. fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2): 326-342.
[13] 余子健, 马德, 严晓浪, 等. 基于FPGA的卷积神经网络加速器[J]. 计算机工程, 2017, 43(1): 109-114, 119.
YU Z J, MA D, YAN X L, et al. FPGA-based accelerator for convolutional neural network[J]. Computer Engineering, 2017, 43(1): 109-114, 119. (in Chinese)
[14] LOU W, QIN Y, WANG Z, et al. Automated FPGA accelerator generation framework for transformers with dataflow optimization[C]//Proceedings of the 54th International Conference on Parallel Processing. New York, USA: ACM, 2025: 406-416.
[15] DONG J, LOU W, WU H, et al. MoE-Sched: Enabling efficient FPGA deployment of mixture-of-experts vision transformers via coordinated scheduling[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2025. DOI: 10.1109/TVLSI.2025.3604705.
[16] DONG J, LOU W, ZHENG Z, et al. UbiMoE: A ubiquitous mixture-of-experts vision transformer accelerator with hybrid computation pattern on FPGA[C]//2025 IEEE International Symposium on Circuits and Systems. Washington D. C., USA: IEEE, 2025: 1-5.
[17] SARKAR R, LIANG H, FAN Z, et al. Edge-MoE: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts[C]//2023 IEEE/ACM International Conference on Computer Aided Design. San Francisco, CA, USA: IEEE, 2023: 1-9.
[18] HE J, QIU J, ZENG A, et al. FastMoE: A fast mixture-of-expert training system[J/OL]. arXiv preprint, 2021[2026-03-24]. https://arxiv.org/abs/2103.13262.
[19] RAJBHANDARI S, LI C, YAO Z, et al. DeepSpeed-MoE: Advancing mixture-of-experts inference and training to power next-generation AI scale[C]//Proceedings of the 39th International Conference on Machine Learning. Baltimore, USA: PMLR, 2022: 18332-18346.
[20] FRANTAR E, ALISTARH D. QMoE: Sub-1-bit compression of trillion parameter models[C]//Proceedings of the 7th Conference on Machine Learning and Systems. Santa Clara, CA, USA: mlsys.org, 2024.
[21] KIM S, GHOLAMI A, YAO Z, et al. I-BERT: Integer-only BERT quantization[C]//Proceedings of the 38th International Conference on Machine Learning. Baltimore, USA: PMLR, 2021: 5506-5518.
[22] LU X, LIU Q, XU Y, et al. Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand: Association for Computational Linguistics, 2024: 6159-6172.
[23] XILINX. Vitis AI user guide[EB/OL]. (2023)[2024-03-01]. https://www.xilinx.com/.
[24] ZHANG X, YE H, WANG J, et al. DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator[C]//Proceedings of the 39th IEEE/ACM International Conference on Computer-Aided Design. Washington D. C., USA: IEEE, 2020: 1-9.
[25] WANG T, GONG L, WANG C, et al. ViA: A novel vision-transformer accelerator based on FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(11): 4088-4099.
[26] WANG H, ZHANG Z, HAN S. SpAtten: Efficient sparse attention architecture with cascade token and head pruning[C]//2021 IEEE International Symposium on High-Performance Computer Architecture. Seoul, South Korea: IEEE, 2021: 97-110.
[27] LU L, JIN Y, BI H, et al. Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture[C]//Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. New York, USA: ACM, 2021: 977-991.
[28] BIGGS B, BOUGANIS C S, CONSTANTINIDES G. ATHEENA: A toolflow for hardware early-exit network automation[C]//2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines. Marina Del Rey, CA, USA: IEEE, 2023: 121-132.
[29] LIN X, TIAN H, XUE W, et al. FLAME: Fully leveraging MoE sparsity for transformer on FPGA[C]//Proceedings of the 61st ACM/IEEE Design Automation Conference. Washington D. C., USA: IEEE, 2024: 1-6.
|