1 |
XU Z W, CHI X B, XIAO N. High-performance computing environment: a review of twenty years of experiments in China. National Science Review, 2016, 3(1): 36- 48.
doi: 10.1093/nsr/nww001
|
2 |
WANG H Q, PENG S L, ZHU X Q, et al. A method to accelerate GROMACS in offload mode on Tianhe-2 supercomputer[C]//Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Washington D. C., USA: IEEE Press, 2015: 781-784.
|
3 |
PIÑEIRO C, PICHEL J C. A unified framework to improve the interoperability between HPC and Big Data languages and programming models. Future Generation Computer Systems, 2022, 134, 123- 139.
doi: 10.1016/j.future.2022.04.002
|
4 |
YIN F, SHI F. A comparative survey of big data computing and HPC: from a parallel programming model to a cluster architecture. International Journal of Parallel Programming, 2022, 50(1): 27- 64.
doi: 10.1007/s10766-021-00717-y
|
5 |
HEINECKE A, BREUER A, RETTENBERGER S, et al. Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers[C]//Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis. Washington D. C., USA: IEEE Press, 2014: 3-14.
|
6 |
YAN D, WANG W, CHU X. An LLVM-based open-source compiler for NVIDIA GPUs[C]//Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM Press, 2022: 448-449.
|
7 |
SHOBAKI G, KERBOW A, PULIDO C, et al. Exploring an alternative cost function for combinatorial register-pressure-aware instruction scheduling. ACM Transactions on Architecture and Code Optimization, 2019, 16(1): 1- 30.
|
8 |
CHEN S M, WANG Y H, LIU S, et al. FT-Matrix: a coordination-aware architecture for signal processing. IEEE Micro, 2014, 34(6): 64- 73.
doi: 10.1109/MM.2013.129
|
9 |
荀长庆, 陈照云, 文梅, 等. 以编译为导向的Matrix-DSP程序分析与优化. 计算机工程与科学, 2020, 42(10): 1791- 1800.
doi: 10.3969/j.issn.1007-130X.2020.10.011
|
|
XUN C Q, CHEN Z Y, WEN M, et al. Compilation-oriented code analysis and optimization for Matrix-DSP. Computer Engineering & Science, 2020, 42(10): 1791- 1800.
doi: 10.3969/j.issn.1007-130X.2020.10.011
|
10 |
PANDEY M, SARDA S. LLVM cookbook. [S. l.]: Packt, 2015: 296.
|
11 |
LOZANO R C, CARLSSON M, DREJHAMMAR F, et al. Constraint-based register allocation and instruction scheduling[C]//Proceedings of International Conference on Principles and Practice of Constraint Programming. Berlin, Germany: Springer, 2012: 750-766.
|
12 |
SHOBAKI G, GORDON V S, MCHUGH P, et al. Register-pressure-aware instruction scheduling using ant colony optimization. ACM Transactions on Architecture and Code Optimization, 19(2): 23.
|
13 |
DORIGO M, MANIEZZO V, COLORNI A. Ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 1996, 26(1): 29- 41.
doi: 10.1109/3477.484436
|
14 |
刘胜, 卢凯, 郭阳, 等. 一种自主设计的面向E级高性能计算的异构融合加速器. 计算机研究与发展, 2021, 58(6): 1234- 1237.
URL
|
|
LIU S, LU K, GUO Y, et al. A self-designed heterogeneous accelerator for exascale high performance computing. Journal of Computer Research and Development, 2021, 58(6): 1234- 1237.
URL
|
15 |
GIESEMANN F, GERLACH L, PAYÁ-VAYÁ G. Evolutionary algorithms for instruction scheduling, operation merging, and register allocation in VLIW compilers. Journal of Signal Processing Systems, 2020, 92(7): 655- 678.
doi: 10.1007/s11265-019-01493-2
|
16 |
LOZANO R C, CARLSSON M, BLINDELL G H, et al. Combinatorial register allocation and instruction scheduling. ACM Transactions on Programming Languages and Systems, 41(3): 17.
|
17 |
MALEKI S, GAO Y Q, GARZAR'N M J, et al. An evaluation of vectorizing compilers[C]//Proceedings of International Conference on Parallel Architectures and Compilation Techniques. Washington D. C., USA: IEEE Press, 2011: 372-382.
|
18 |
李嘉楠, 韩林, 柴赟达. 面向国产平台的LLVM自动向量化移植与优化. 计算机工程, 2022, 48(1): 142- 148.
URL
|
|
LI J N, HAN L, CHAI Y D. Automatic vectorization transplant and optimization of LLVM for domestic processors. Computer Engineering, 2022, 48(1): 142- 148.
URL
|
19 |
冯竞舸, 贺也平, 陶秋铭. 自动向量化: 近期进展与展望. 通信学报, 2022, 43(3): 180- 195.
URL
|
|
FENG J G, HE Y P, TAO Q M. Auto-vectorization: recent development and prospect. Journal on Communications, 2022, 43(3): 180- 195.
URL
|
20 |
MAMMADLI R, JANNESARI A, WOLF F. Static neural compiler optimization via deep reinforcement learning[C]//Proceedings of 2020 IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC(LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing(HiPar). Washington D. C., USA: IEEE Press, 2020: 1-10.
|
21 |
WU L, PEI J, TANG J, et al. Deep learning on graphs: methods and applications[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2022: 4906-4907.
|
22 |
WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4- 24.
doi: 10.1109/TNNLS.2020.2978386
|
23 |
|
24 |
WANG M J, YU L F, ZHENG D, et al. Deep graph library: towards efficient and scalable deep learning on graphs[EB/OL]. [2023-01-02]. http://arxiv.org/abs/1909.01315v1.
|
25 |
池昊宇, 陈长波. 基于机器学习的编译器自动调优综述. 计算机科学, 2022, 49(1): 241- 251.
URL
|
|
CHI H Y, CHEN C B. Survey on automatic tuning of compilers by machine learning. Computer Science, 2022, 49(1): 241- 251.
URL
|