1 |
|
2 |
|
3 |
黄海峰. 解读"暴力"的AI芯片昇腾910. 通信世界, 2019(24): 22- 23.
|
|
HUANG H F. The AI chip that interprets "violence" rises to 910. Communications World, 2019(24): 22- 23.
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
ANZT H, BOMAN E G, GATES M, et al. Towards use of mixed precision in ECP math libraries[D]. Livermore, USA: Lawrence Livermore National Laboratory, 2021.
|
11 |
|
12 |
Innovative Computing Laboratory. HPL-AI: high-performance linpack for artificial intelligence[EB/OL]. [2023-10-01]. https://icl.utk.edu/hpl-ai/.
|
13 |
HPL-MxP Team. HPL-MxP: high-performance linpack mixed precision benchmark[EB/OL]. [2023-10-01]. https://hpl-mxp.org.
|
14 |
KUDO S, NITADORI K, INA T, et al. Implementation and numerical techniques for one EFlop/s HPL-AI benchmark on fugaku[C]//Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems. Washington D. C., USA: IEEE Press, 2020: 256-266.
|
15 |
LIN R F, YUAN X H, XUE W, et al. 5 ExaFlop/s HPL-MxP benchmark with linear scalability on the 40-million-core sunway supercomputer[C]//Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA: ACM Press, 2023: 536-547.
|
16 |
|
17 |
苏月. 华为鲲鹏920: 一颗勇敢的"芯". 计算机与网络, 2019, 45(21): 72- 73.
|
|
SU Y. Huawei Kunpeng 920: a brave "core". Computer & Network, 2019, 45(21): 72- 73.
|
18 |
CARSON E, HIGHAM N J. Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM Journal on Scientific Computing, 2018, 40(2): 817- 847.
doi: 10.1137/17M1140819
|
19 |
CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems. SIAM Journal on Scientific Computing, 2017, 39(6): 2834- 2856.
doi: 10.1137/17M1122918
|
20 |
HIGHAM N J, PRANESH S, ZOUNON M. Squeezing a matrix into half precision, with an application to solving linear systems. SIAM Journal on Scientific Computing, 2019, 41(4): 2536- 2551.
doi: 10.1137/18M1229511
|
21 |
HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. Washington D. C., USA: IEEE Press, 2018: 603-613.
|
22 |
BLANCHARD P, HIGHAM N J, LOPEZ F, et al. Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores. SIAM Journal on Scientific Computing, 2020, 42(3): 124- 141.
doi: 10.1137/19M1289546
|
23 |
|
24 |
|
25 |
TOMOV S, DONGARRA J. Matrix algebra on GPU and multicore architectures[C]//Proceedings of Workshop on Electronic Structure Calculation Methods Accelerators. Washington D. C., USA: IEEE Press 2010: 5-8.
|
26 |
|