计算机工程 ›› 2012, Vol. 38 ›› Issue (16): 275-278.doi: 10.3969/j.issn.1000-3428.2012.16.072

• 开发研究与设计技术 • 上一篇    下一篇

基于混合架构的FMM算法硬件加速

曹 旻,李海强,曹 真   

  1. (上海大学计算机工程与科学学院,上海 200072)
  • 收稿日期:2011-08-30 修回日期:2011-12-06 出版日期:2012-08-20 发布日期:2012-08-17
  • 作者简介:曹 旻(1966-),女,博士,主研方向:高性能计算,分布式计算;李海强、曹 真,硕士研究生
  • 基金项目:

    国家“863”计划基金资助项目(2009AA012201-CFA2009SHDX01);上海市重点学科建设基金资助项目(J50103)

Hardware Acceleration of FMM Algorithm Based on Mixed Architecture

CAO Min, LI Hai-qiang, CAO Zhen   

  1. (School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China)
  • Received:2011-08-30 Revised:2011-12-06 Online:2012-08-20 Published:2012-08-17

摘要: 以高性能计算中的经典问题——多体问题的快速多极子(FMM)算法为例,分析FMM算法的各个步骤,根据计算、通信和存储特性将算法中的子过程归类。在CPU、GPU、FPGA和CELL上分别进行测试,提出执行FMM算法的混合可重构体系结构配置方案,并进一步优化算法,分解任务流。针对不同任务流的特点,提出可行的解决方案。结果证明,该方案可提高算法效率。

关键词: 混合可重构计算机体系结, 加速部件, N-Body问题, 快速多极子算法, 配置方案, 任务流

Abstract: Accelerators are increasingly viewed as computer coprocessors that can provide significant computational performance at low price. This paper implements and tests every sub-procedure of Fast Multipole Method(FMM) on GPU, FPGA and CELL based on the analysis of computational, storage and communication characteristics. It makes two contributions to optimize FMM. A mixed configurable computer architecture which can run FMM well is presented. FMM is optimized on mixed architecture through decomposing its task flow. The probable solution for different task flow is also put forward based on the large experiment results. Results show that the scheme can increase the efficiency of the algorithm.

Key words: mixed configurable computer architecture, acceleration component, N-Body problem, Fast Multipole Method(FMM) algorithm, configuration scheme, task flow

中图分类号: