1 |
DUBOIS R, SILVA E G, PARNAUDEAU P. High performance computing of stiff bubble collapse on CPU-GPU heterogeneous platform. Computers & Mathematics with Applications, 2021, 99, 246- 256.
|
2 |
李嘉楠, 韩林, 柴赟达. 面向国产平台的LLVM自动向量化移植与优化. 计算机工程, 2022, 48(1): 142- 148.
URL
|
|
LI J N, HAN L, CHAI Y D. Automatic vectorization transplant and optimization of LLVM for domestic processors. Computer Engineering, 2022, 48(1): 142- 148.
URL
|
3 |
胡伟方. 面向DCU的多面体编译优化技术研究[D]. 郑州: 郑州大学, 2021.
|
|
HU W F. Research on polyhedral compilation and optimization techniques for DCU[D]. Zhengzhou: Zhengzhou University, 2021. (in Chinese)
|
4 |
BABEJ M, JÄÄSKELÄINEN P. HIPCL: tool for porting CUDA applications to advanced OpenCL platforms through HIP[C]//Proceedings of International Workshop on OpenCL. New York, USA: ACM Press, 2020: 1-3.
|
5 |
姚远. SIMD自动向量识别及代码调优技术研究[D]. 郑州: 解放军信息工程大学, 2012.
|
|
YAO Y. Research on automatic SIMD vectorization recognization and code tuning technology[D]. Zhengzhou: PLA Information Engineering University, 2012. (in Chinese)
|
6 |
汪梦萱. CPU——GPU异构架构下共享内存管理策略的研究[D]. 北京: 北京工业大学, 2020.
|
|
WANG M X. Research on shared memory management strategy under CPU—GPU heterogeneous architecture[D]. Beijing: Beijing University of Technology, 2020. (in Chinese)
|
7 |
SHIROKANEV A S, ANDRIYANOV N A, ILYASOVA N Y. Development of vector algorithm using CUDA technology for three-dimensional retinal laser coagulation process modeling. Computer Optics, 2021, 45(3): 427- 437.
|
8 |
王细凯. 基于Bank划分的异构内存访存管理机制[D]. 武汉: 华中科技大学, 2016.
|
|
WANG X K. Heterogeneous memory access management mechanism based on bank partition[D]. Wuhan: Huazhong University of Science and Technology, 2016. (in Chinese)
|
9 |
杨世伟, 蒋国平, 宋玉蓉, 等. 基于GPU的稀疏矩阵存储格式优化研究. 计算机工程, 2019, 45(9): 23-31, 39.
URL
|
|
YANG S W, JIANG G P, SONG Y R, et al. Research on storage format optimization of sparse matrix based on GPU. Computer Engineering, 2019, 45(9): 23-31, 39.
URL
|
10 |
YANG Y, XIANG P, KONG J F, et al. A GPGPU compiler for memory optimization and parallelism management. ACM SIGPLAN Notices, 2010, 45(6): 86- 97.
doi: 10.1145/1809028.1806606
|
11 |
王琦, 韩林, 姚金阳, 等. 不充分SIMD向量化技术研究. 计算机应用与软件, 2018, 35(9): 108- 112.
|
|
WANG Q, HAN L, YAO J Y, et al. Research on vectorization technology for insufficient SIMD. Computer Applications and Software, 2018, 35(9): 108- 112.
|
12 |
狄棒. 异构系统内存架构的安全与数据一致性问题研究[D]. 长沙: 湖南大学, 2021.
|
|
DI B. Research on security and crash consistency of memory architecture for heterogeneous system[D]. Changsha: Hunan University, 2021. (in Chinese)
|
13 |
徐金龙, 赵荣彩, 刘鹏, 等. 程序向量化中非规则访存问题研究. 计算机工程, 2015, 41(12): 86- 90.
URL
|
|
XU J L, ZHAO R C, LIU P, et al. Research on irregular memory access problem for programs vectorization. Computer Engineering, 2015, 41(12): 86- 90.
URL
|
14 |
MEI X X, CHU X W. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(1): 72- 86.
doi: 10.1109/TPDS.2016.2549523
|
15 |
贺婷. 基于数据级自动向量化的编译优化研究综述. 智能计算机与应用, 2016, 6(6): 68- 71.
|
|
HE T. An overview of compilation and optimization of automatic vector quantization based on data level. Intelligent Computer and Applications, 2016, 6(6): 68- 71.
|
16 |
DICKSON N G, KARIMI K, HAMZE F. Importance of explicit vectorization for CPU and GPU software performance. Journal of Computational Physics, 2011, 230(13): 5383- 5398.
doi: 10.1016/j.jcp.2011.03.041
|
17 |
SU X N, HE C, LIU T Q, et al. Full parallel power flow solution: a GPU-CPU-based vectorization parallelization and sparse techniques for Newton-Raphson implementation. IEEE Transactions on Smart Grid, 2020, 11(3): 1833- 1844.
doi: 10.1109/TSG.2019.2943746
|
18 |
MOAZENI M, BUI A, SARRAFZADEH M. A memory optimization technique for software-managed scratchpad memory in GPUs[C]//Proceedings of the 7th Symposium on Application Specific Processors. Washington D. C., USA: IEEE Press, 2009: 43-49.
|
19 |
梁军, 李威, 肖琳, 等. NVIDIA Tegra K1异构计算平台访存优化研究. 计算机工程, 2016, 42(12): 44- 49.
URL
|
|
LIANG J, LI W, XIAO L, et al. Research on memory access optimization of NVIDIA tegra K1 heterogeneous computing platform. Computer Engineering, 2016, 42(12): 44- 49.
URL
|
20 |
杜晓刚, 党建武, 王阳萍. 基于CUDA的改进互信息并行计算方法. 计算机工程, 2015, 41(12): 288-292, 298.
URL
|
|
DU X G, DANG J W, WANG Y P. Improved parallel computation method of mutual information based on compute unified device architecture. Computer Engineering, 2015, 41(12): 288-292, 298.
URL
|
21 |
原建伟, 李爱国, 李文宇. GPU编程模型中存储体冲突的研究. 河北工业科技, 2013, 30(1): 39-41, 46.
|
|
YUAN J W, LI A G, LI W Y. Study of bank conflict in GPU programming model. Hebei Journal of Industrial Science and Technology, 2013, 30(1): 39-41, 46.
|
22 |
张吉赞, 古志民. 多核共享缓存bank冲突分析及其延迟最小化. 计算机学报, 2016, 39(9): 1883- 1899.
|
|
ZHANG J Z, GU Z M. Analyzing bank access conflict and minimizing bank conflict delay for shared cache in multicore. Chinese Journal of Computers, 2016, 39(9): 1883- 1899.
|
23 |
|
24 |
ZHANG Y N, QIAN H Y. Porting and optimizing G-BLASTN to the ROCm-based supercomputer[C]//Proceedings of International Conference on Computer Science and Management Technology. Washington D. C., USA: IEEE Press, 2020: 73-77.
|
25 |
赵志建. 基于CUDA并行优化的矩阵相乘算法研究. 智能计算机与应用, 2022, 12(11): 192- 196.
|
|
ZHAO Z J. Research on matrix multiplication algorithm based on CUDA parallel optimization. Intelligent Computer and Applications, 2022, 12(11): 192- 196.
|
26 |
ZHAO T, BASU P, WILLIAMS S W, et al. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs[C]//Proceedings of HiPC 2019. New York, ACM Press: 2019: 1-10.
|