摘要: 以高性能计算中的经典问题——多体问题的快速多极子(FMM)算法为例,分析FMM算法的各个步骤,根据计算、通信和存储特性将算法中的子过程归类。在CPU、GPU、FPGA和CELL上分别进行测试,提出执行FMM算法的混合可重构体系结构配置方案,并进一步优化算法,分解任务流。针对不同任务流的特点,提出可行的解决方案。结果证明,该方案可提高算法效率。
关键词:
混合可重构计算机体系结,
加速部件,
N-Body问题,
快速多极子算法,
配置方案,
任务流
Abstract: Accelerators are increasingly viewed as computer coprocessors that can provide significant computational performance at low price. This paper implements and tests every sub-procedure of Fast Multipole Method(FMM) on GPU, FPGA and CELL based on the analysis of computational, storage and communication characteristics. It makes two contributions to optimize FMM. A mixed configurable computer architecture which can run FMM well is presented. FMM is optimized on mixed architecture through decomposing its task flow. The probable solution for different task flow is also put forward based on the large experiment results. Results show that the scheme can increase the efficiency of the algorithm.
Key words:
mixed configurable computer architecture,
acceleration component,
N-Body problem,
Fast Multipole Method(FMM) algorithm,
configuration scheme,
task flow
中图分类号:
曹旻, 李海强, 曹真. 基于混合架构的FMM算法硬件加速[J]. 计算机工程, 2012, 38(16): 275-278.
CAO Min, LI Hai-Jiang, CAO Zhen. Hardware Acceleration of FMM Algorithm Based on Mixed Architecture[J]. Computer Engineering, 2012, 38(16): 275-278.