1 |
MARTORELL X, AYGUADÉ E, NAVARRO N, et al. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors[C]//Proceedings of the 13th International Conference on Supercomputing. New York, USA: ACM Press, 1999: 284-301.
|
2 |
LU K , WANG Y H , GUO Y , et al. MT-3000:a heterogeneous multi-zone processor for HPC. CCF Transactions on High Performance Computing, 2022, 4 (2): 150- 164.
doi: 10.1007/s42514-022-00095-y
|
3 |
DONGARRA J , STERLING T , SIMON H , et al. High-performance computing: clusters, constellations, MPPs, and future directions. Computing in Science & Engineering, 2005, 7 (2): 51- 59.
doi: 10.1109/MCSE.2005.34
|
4 |
DIAZ J , MUÑOZ-CARO C , NIÑO A . A survey of parallel programming models and tools in the multi and many-core era. IEEE Transactions on Parallel and Distributed Systems, 2012, 23 (8): 1369- 1386.
doi: 10.1109/TPDS.2011.308
|
5 |
HACKENBERG D , JUCKELAND G , BRUNST H . Performance analysis of multi-level parallelism: inter-node, intra-node and hardware accelerators. Concurrency and Computation: Practice and Experience, 2012, 24 (1): 62- 72.
doi: 10.1002/cpe.1725
|
6 |
涂碧波, 邹铭, 詹剑锋, 等. 多核处理器机群Memory层次化并行计算模型研究. 计算机学报, 2008, 31 (11): 1948- 1955.
doi: 10.3321/j.issn:0254-4164.2008.11.009
|
|
TU B B , ZOU M , ZHAN J F , et al. Research on parallel computation model with memory hierarchy on multi-core clusters. Chinese Journal of Computers, 2008, 31 (11): 1948- 1955.
doi: 10.3321/j.issn:0254-4164.2008.11.009
|
7 |
张云泉, 袁良, 陈一峯, 等. 高性能计算多层次不连续非线性可扩展现象研究. 计算机学报, 2020, 43 (6): 973- 989.
doi: 10.11897/SP.J.1016.2020.00973
|
|
ZHANG Y Q , YUAN L , CHEN Y F , et al. Multi-level discontinuous and nonlinear scalability phenomenon in high performance computing. Chinese Journal of Computers, 2020, 43 (6): 973- 989.
doi: 10.11897/SP.J.1016.2020.00973
|
8 |
臧大伟, 曹政, 孙凝晖. 高性能计算的发展. 科技导报, 2016, 34 (14): 22- 28.
|
|
ZANG D W , CAO Z , SUN N H . The development of high-performance computing. Science & Technology Review, 2016, 34 (14): 22- 28.
|
9 |
金钟, 陆忠华, 李会元, 等. 高性能计算之源起——科学计算的应用现状及发展思考. 中国科学院院刊, 2019, 34 (6): 625- 639.
|
|
JIN Z , LU Z H , LI H Y , et al. Origin of high performance computing-current status and developments of scientific computing applications. Bulletin of Chinese Academy of Sciences, 2019, 34 (6): 625- 639.
|
10 |
王涛. 量子化学中的高性能计算. 华东师范大学学报(自然科学版), 2018 (4): 109- 119.
|
|
WANG T . High performance computing in quantum chemistry. Journal of East China Normal University (Natural Science), 2018 (4): 109- 119.
|
11 |
GARCIA-GASULLA M , BANCHELLI F , PEIRO K , et al. A generic performance analysis technique applied to different CFD methods for HPC. International Journal of Computational Fluid Dynamics, 2020, 34 (7/8): 508- 528.
doi: 10.1080/10618562.2020.1778168
|
12 |
|
13 |
|
14 |
RASCH A , SCHULZE R , STEUWER M , et al. Efficient auto-tuning of parallel programs with interdependent tuning parameters via Auto-Tuning Framework (ATF). ACM Transactions on Architecture and Code Optimization, 2021, 18 (1): 1- 26.
doi: 10.1145/3427093
|
15 |
MENON H, BHATELE A, GAMBLIN T. Auto-tuning parameter choices in HPC applications using Bayesian optimization[C]//Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). Washington D.C., USA: IEEE Press, 2020: 831-840.
|
16 |
FRIGO M, JOHNSON S G. FFTW: an adaptive software architecture for the FFT[C]//Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 1998: 1381-1384.
|
17 |
GU L, LI X M. DFT performance prediction in FFTW[C]//Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing. Berlin, Germany: Springer, 2010: 140-156.
|
18 |
|
19 |
BECKINGSALE D, PEARCE O, LAGUNA I, et al. Apollo: reusable models for fast, dynamic tuning of input-dependent code[C]//Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). Washington D.C., USA: IEEE Press, 2017: 307-316.
|
20 |
MATTSON T G, CLEDAT R, CAVÉ V, et al. The Open Community Runtime: a runtime system for extreme scale computing[C]//Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC). Washington D.C., USA: IEEE Press, 2016: 1-7.
|
21 |
DOKULIL J, SANDRIESER M, BENKNER S. OCR-Vx-an alternative implementation of the open community runtime[C]//Proceedings of the International Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures in Conjunction with SC15. Washington D.C., USA: IEEE Press, 2015: 1-10.
|
22 |
DOKULIL J, BENKNER S. Automatic placement of tasks to NUMA nodes in iterative applications[C]//Proceedings of the 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. Washington D.C., USA: IEEE Press, 2020: 192-195.
|
23 |
KRESSE G , FURTHMVLLER J . Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Computational Materials Science, 1996, 6 (1): 15- 50.
doi: 10.1016/0927-0256(96)00008-0
|
24 |
GIANNOZZI P , BARONI S , BONINI N , et al. QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. Journal of Physics Condensed Matter, 2009, 21 (39): 395502.
doi: 10.1088/0953-8984/21/39/395502
|
25 |
STEGAILOV V , SMIRNOV G , VECHER V . VASP hits the memory wall: processors efficiency comparison. Concurrency and Computation: Practice and Experience, 2019, 31 (19): 5136.
doi: 10.1002/cpe.5136
|
26 |
STEGAILOV V , VECHER V . Efficiency analysis of intel and AMD x86_64 architectures for ab initio calculations: a case study of VASP. Berlin, Germany: Springer, 2017.
|
27 |
|
28 |
MCINTOSH-SMITH S , PRICE J , DEAKIN T , et al. A performance analysis of the first generation of HPC-optimized arm processors. Concurrency and Computation: Practice and Experience, 2019, 31 (16): 5110.
doi: 10.1002/cpe.5110
|
29 |
WU G B , SHEN Y , ZHANG W S , et al. Runtime prediction of jobs for backfilling optimization. Journal of Chinese Computer Systems, 2019, 40 (1): 6- 12.
URL
|
30 |
WANG Y R, DU Z H, JIANG J, et al. Modeling the parallel efficiency of density functional theory based jobs on sunway TaihuLight[C]//Proceedings of the IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). Washington D.C., USA: IEEE Press, 2019: 199-204.
|
31 |
|
32 |
|