参考文献
[1]刘颖,吕方,王蕾,等.异构并行编程模型研究与进展[J].软件学报,2014,25(7):1459-1475.
[2]祝永志,王国仁,李丙锋,等.异构并行计算系统可扩展模型的实现[J].计算机工程,2009,35(17):97-99.
[3]王涛.基于GPU的程序分析与并行化研究[D].郑州:解放军信息工程大学,2010.
[4]盛冲冲,胡新明,李佳佳,等.面向节点异构GPU集群的编程框架[J].计算机工程,2015,41(2):292-297.
[5]GUZ Z,E.KEIDAR B I,KOLODNY A,et al.Many-core vs.Many-thread Machines:Stay Away from the Valley[J].IEEE Computer Architecture Letters,2009,8(1):25-28.
[6]EECKHOUT L.Computer Architecture Performance Evaluation Methods[M].[S.1.]:Morgan & Claypool Publishers,2010.
[7]LIM J,LAKSHMINARAYANA N.Power Modeling for GPU Architecture Using McPAT[Z].Argonne National Laboratory,2013:1-17.
[8]BAKHODA A,YUAN G L,FUNG W W L,et al.Analyzing CUDA Workloads Using a Detailed GPU Simulator[C]//Proceedings of ISPASS’09.Washington D.C.,USA:IEEE Press,2009:163-174.
[9]NOWATZKI T,MENON J,HO C H,et al.Architectural Simulators Considered Harmful[J].IEEE Micro,2015,35(6):4-12.
[10]HONG S,KIM H.An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness[J].ACM SIGARCH Computer Architecture News,2009,37(3):152.
[11]GUZ Z,ITZHAK O,KEIDAR I,et al.Threads vs.Caches:Modeling the Behavior of Parallel Workloads[C]// Proceedings 2002 IEEE International Conference on Computer Design.Washington D.C.,USA:IEEE Press,2010:274-281.
[12]CZECHOWSKI K,BATTAGLINO C,MCCLANAHAN C,et al.On the Communication Complexity of 3D FFTs and Its Implications for Exascale[C]//Proceedings of the 26th ACM International Conference on Supercomputing.New York,USA:ACM Press,2012:216-222.
[13]GóMEZ-LUNA J,GONZálEZ-LINARESB J M.Performance Models for Asynchronous Data Transfers on Consumer Graphics Processing Units[J].Journal of Parallel and Distributed Computing,2012,72(9):1117-1126.
[14]VAN W B,MAASSEN J,SEINSTRA F J,et al.Performance Models for CPU-GPU Data Transfers[C]//Proceedings of the 14th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.Washington D.C.,USA:IEEE Press,2014:11-20.
[15]WILLIAMS S,WATERMAN A,PATTERSON D.Roofline:An Insightful Visual Performance Model for Multicore Architectures[J].Communications of the ACM,2009,52(4):65-76.
[16]GRILLO L,REYES R,SANDE F D.Performance Evaluation of OpenACC Compilers[C]//Proceedings of Euromicro International Conference on Parallel,Distributed,and Network-based Processing.Washington D.C.,USA:IEEE Press,2014:656-663.
编辑索书志 |