1 |
池昊宇, 陈长波. 基于机器学习的编译器自动调优综述. 计算机科学, 2022, 49(1): 241- 251.
URL
|
|
CHI H Y, CHEN C B. Survey on automatic tuning of compilers by machine learning. Computer Science, 2022, 49(1): 241- 251.
URL
|
2 |
LI M Z, LIU Y, LIU X Y, et al. The deep learning compiler: a comprehensive survey. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(3): 708- 727.
|
3 |
|
4 |
ZHAO J, LI B J, NIE W, et al. AKG: automatic kernel generation for neural processing units using polyhedral transformations[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York, USA: ACM Press, 2021: 1233-1248.
|
5 |
LIANG G M, YUAN C Y, YU M S, et al. The support of MLIR HLS adaptor for LLVM IR[C]//Proceedings of the 51st International Conference on Parallel Processing. New York, USA: ACM Press, 2022: 1-8.
|
6 |
VASILACHE N, ZINENKO O, THEODORIDIS T, et al. Tensor comprehensions: framework-agnostic high-performance machine learning abstractions[EB/OL]. [2023-07-01]. https://arxiv.org/pdf/1802.04730.
|
7 |
ZHENG L M, JIA C F, SUN M M, et al. Ansor: generating high-performance Tensor programs for deep learning[C]//Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. New York, USA: ACM Press. 2020: 863-789.
|
8 |
杨思驰, 赵荣彩, 韩林, 等. 面向DCU的共享内存访问向量化. 计算机工程, 2024, 50(2): 206- 213.
URL
|
|
YANG S C, ZHAO R C, HAN L, et al. Vectorization optimization of shared memory access for DCU. Computer Engineering, 2024, 50(2): 206- 213.
URL
|
9 |
ZHU H Y, WU R F, DIAO Y J, et al. Roller: fast and efficient tensor compilation for deep learning[C]//Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. Carlsbad, USA: USENIX Association, 2022: 233-248.
|
10 |
刘功晗, 李悦, 王晓玲. 面向航天异构平台的深度学习编译器加速技术优化. 航天控制, 2022, 40(2): 60- 65.
URL
|
|
LIU G H, LI Y, WANG X L. Optimization of deep learning compiler acceleration technology for aerospace heterogeneous platforms. Aerospace Control, 2022, 40(2): 60- 65.
URL
|
11 |
潘秋红, 何水兵, 陈刚, 等. 一种用于深度学习编译器中探索优化空间的加速方法: 112579063[P]. 2021-06-08.
|
|
PAN Q H, HE S B, CHEN G, et al. One approach to accelerating the exploration of optimization space in a deep learning compiler is as follows: 112579063[P]. 2021-06-08. (in Chinese)
|
12 |
赵佳棋. 基于深度强化学习的编译器自动优化方法研究[D]. 西安: 西北大学, 2022.
|
|
ZHAO J Q. Research on compiler auto-tuning method based on deep reinforce learning[D]. Xi'an: Northwest University, 2022. (in Chinese)
|
13 |
ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems[C]//Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). Carlsbad, USA: USENIX Association, 2016: 265-283.
|
14 |
|
15 |
GASKILL B. ONNX: the open neural network exchange format. Linux Journal, 2018, 285, 157- 161.
|
16 |
ROESCH J, LYUBOMIRSKY S, WEBER L, et al. Relay: a new IR for machine learning frameworks[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. New York, USA: ACM Press, 2018: 58-68.
|
17 |
申云飞, 申飞, 李芳, 等. 基于张量虚拟机的深度神经网络模型加速方法. 计算机应用, 2023, 43(9): 2836- 2844.
URL
|
|
SHEN Y F, SHEN F, LI F, et al. Deep neural network model acceleration method based on tensor virtual machine. Journal of Computer Applications, 2023, 43(9): 2836- 2844.
URL
|
18 |
CHEN T Q, ZHENG L M, YAN E, et al. Learning to optimize Tensor programs[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2018: 3393-3404.
|
19 |
WU J Y, BELEVICH A, BENDERSKY E, et al. Gpucc: an open-source GPGPU compiler[C]//Proceedings of the 2016 International Symposium on Code Generation and Optimization. New York, USA: ACM Press, 2016: 105-116.
|
20 |
VIKHAR P A. Evolutionary algorithms: a critical review and its future prospects[C]//Proceedings of the International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC). Washington D. C., USA: IEEE Press, 2016: 261-265.
|
21 |
|
22 |
FRIEDMAN J H. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 2001, 29(5): 1189- 1232.
|
23 |
CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2016: 785-794.
|
24 |
KE G L, MENG Q, FINELY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 3149-3157.
|
25 |
BENTÉJAC C, CSÖRGŐ A, MARTÍNEZ-MUÑOZ G. A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 2021, 54(3): 1937- 1967.
|
26 |
HOLEWINSKI J, POUCHET L N, SADAYAPPAN P. High-performance code generation for stencil computations on GPU architectures[C]//Proceedings of the 26th ACM International Conference on Supercomputing. New York, USA: ACM Press, 2012: 311-320.
|