作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 51-58. doi: 10.19678/j.issn.1000-3428.0067536

• 热点与综述 • 上一篇    下一篇

面向FT-M6678的对称矩阵特征值求解算法实现与优化

于立1,*(), 韩林2, 罗有才1, 商建东2   

  1. 1. 郑州大学计算机与人工智能学院, 河南 郑州 450001
    2. 国家超级计算郑州中心, 河南 郑州 450001
  • 收稿日期:2023-05-04 出版日期:2024-02-15 发布日期:2023-08-17
  • 通讯作者: 于立
  • 基金资助:
    河南省重大科技专项(221100210600)

Algorithm Implementation and Optimization of Symmetric Matrix Eigenvalue Solution for FT-M6678

Li YU1,*(), Lin HAN2, Youcai LUO1, Jiandong SHANG2   

  1. 1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
    2. National Supercomputing Center in Zhengzhou, Zhengzhou 450001, Henan, China
  • Received:2023-05-04 Online:2024-02-15 Published:2023-08-17
  • Contact: Li YU

摘要:

目前国产自主可控FT-M6678平台上没有对称矩阵特征值求解相关的实现,且平台上现有数学计算库不能很好地满足类似问题求解的需求。面向国产FT-M6678处理器,对对称矩阵特征值求解(SYEV)算法进行实现与优化,完善FT-M6678平台的线性代数计算库。通过对SYEV算法的实现过程以及运行热点的分析,基于FT-M6678平台进行编译优化、访存优化以及向量并行化优化,其中:编译优化是根据不同的编译选项指导编译器对程序优化以达到加速效果;访存优化包括缓存优化以及数据段与程序段的分配优化,用于提高矩阵数据的访存效率;向量并行化优化包括循环展开以及适配FT-M6678平台的单指令多数据流(SIMD)指令并行优化,用于提升程序的计算效率。在FT-M6678平台上对所实现并优化的算法进行正确性验证与优化性能分析,结果表明,算法能够正确通过LAPACK官方测试集测试,并且在FT-M6678平台上的加速效果可达到58.346倍,对比TMS320C6678平台速度可提升2.053倍。

关键词: 对称矩阵特征值, FT-M6678平台, 热点分析, 缓存优化, 向量并行

Abstract:

Currently, there is no implementation related to the symmetric matrix eigenvalue solution on China's autonomous and controllable FT-M6678 platform, and the existing mathematical calculation library on this platform cannot satisfy the requirements for solving similar problems. This study focuses on the domestic FT-M6678 processor, implements and optimizes the algorithm of the symmetric matrix eigenvalue solution, SYEV, and improves the linear algebra calculation library of the FT-M6678 platform. First, by analyzing the implementation process and running hotspots of the SYEV algorithm, compile, memory access, and vector parallel optimizations are performed based on the FT-M6678 platform. Compilation optimization refers to guiding the compiler to optimize programs based on different compilation options to achieve acceleration effects; memory access optimization includes cache optimization and allocation optimization of data and program segments, accelerating the efficiency of matrix data access; and vector parallelization optimization includes loop unrolling and Single Instruction Multiple Data(SIMD)instruction parallel optimization adapted to the FT-M6678 platform, which improves the computational efficiency of programs. Verification and performance tests of the implemented and optimized algorithms are performed using the FT-M6678 platform. The accuracy of the algorithms passes the test of official Linear Algebra PACKage(LAPACK)test set, and the optimization acceleration effect of the algorithm on the FT-M6678 platform can reach 58.346 times, which can improve the speed by 2.053 times compared with the TMS320C6678 platform.

Key words: symmetric matrix eigenvalue, FT-M6678 platform, hotspot analysis, cache optimization, vector parallelism