作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 199-205,213. doi: 10.19678/j.issn.1000-3428.0062198

• 体系结构与软件技术 • 上一篇    下一篇

FPGA架构上面向稀疏矩阵求解的静态调度算法

王晞阳1, 陈继林2, 李猛1, 刘首文3   

  1. 1. 国家超级计算无锡中心, 江苏 无锡 214072;
    2. 中国电力科学研究院有限公司, 北京 100192;
    3. 国网湖北省电力有限公司, 武汉 430070
  • 收稿日期:2021-07-29 修回日期:2021-09-30 出版日期:2022-07-15 发布日期:2021-10-11
  • 作者简介:王晞阳(1976—),男,高级工程师、硕士,主研方向为并行算法与体系结构;陈继林,高级工程师、硕士;李猛,助理工程师;刘首文,高级工程师、博士。
  • 基金资助:
    国家电网公司科技项目“适应于电力系统应用的高性能计算技术研究与开发”(XT71-19-022)。

Static Scheduling Algorithm for Solving Sparse Matrixes on FPGA Architecture

WANG Xiyang1, CHEN Jilin2, LI Meng1, LIU Shouwen3   

  1. 1. National Supercomputing Center in Wuxi, Wuxi, Jiangsu 214072, China;
    2. China Electronic Power Research Institute, Beijing 100192, China;
    3. State Grid Hubei Electric Power Co., Ltd., Wuhan 430070, China
  • Received:2021-07-29 Revised:2021-09-30 Online:2022-07-15 Published:2021-10-11

摘要: 在电力系统仿真中,大型稀疏矩阵的求解会消耗大量存储和计算资源,未有效利用矩阵的稀疏性将导致存储空间浪费以及计算效率低下的问题。当前关于稀疏矩阵求解算法的研究主要针对众核加速硬件,聚焦于挖掘层次集合的并行度以提升算法的并行效率,而在众核处理器架构上频繁地进行缓存判断及细粒度访问可能导致潜在的性能问题。针对基于现场可编程门阵列(FPGA)的下三角稀疏矩阵求解问题,在吴志勇等设计的FPGA稀疏矩阵求解器硬件结构的基础上,提出一种静态调度求解算法。通过对稀疏矩阵进行预处理,设计数据分布和指令排布流程,将下三角稀疏矩阵的求解过程静态映射到多个FPGA片上的处理单元,以实现下三角稀疏矩阵在FPGA上的并行高速求解。将串行算法中所有的隐式并行关系排布到缓冲中,使得所有计算单元都能实现计算、访存和单元间通信的高效并行,从而最大限度地利用FPGA的硬件资源。典型算例上的测试结果表明,相较传统的CPU/GPU求解算法,该算法能够实现5~10倍的加速效果。

关键词: 下三角稀疏矩阵, 静态调度算法, 数据分布, 指令排布, 静态映射

Abstract: In power system simulations, the solution of large-scale sparse matrixes will consume a lot of storage and computing resources, and a failure to effectively utilize the sparsity of the matrix will lead to a waste of storage space and low computing efficiency.Current studies on algorithms that solve sparse matrixes focus mainly on the multi-core acceleration hardware, with emphasis on mining the parallelism of hierarchical sets to improve the parallel efficiency of the algorithm.Frequent cache judgment and fine-grained access on the multi-core processor architecture may lead to potential performance problems.To solve the lower triangular sparse matrix based on Field Programmable Gate Array(FPGA), a static scheduling algorithm is proposed based on the hardware structure of an FPGA sparse matrix solver designed by WU Zhiyong et al.By preprocessing the sparse matrix, designing the data distribution and instruction layout process, the solution process of the lower triangular sparse matrix is statically mapped to multiple Process Elements(PEs) on an FPGA chip to realize the parallel high-speed solution of the lower triangular sparse matrix on the FPGA.By arranging all implicit parallel relations in the serial algorithm into the buffer, all computing units can achieve efficient parallel computing, memory access, and inter unit communication, enabling users to maximize the use of FPGA hardware resources.The test results obtained using typical examples show that compared with traditional CPU/GPU algorithms, the proposed algorithm can achieve an acceleration that is 5-10 times greater than existing methods.

Key words: sparse matrix of the lower triangular, static scheduling algorithm, data distribution, instruction layout, static mapping

中图分类号: