计算机工程 ›› 2020, Vol. 46 ›› Issue (8): 210-215,222.doi: 10.19678/j.issn.1000-3428.0055295

• 体系结构与软件技术 • 上一篇    下一篇

HPEC中子程序级推测并行性分析

王欣夷1a,1b, 王耀彬1a,1b, 李凌1a,1b, 杨洋2, 卜得庆1a,1b, 刘志勤1a,1b   

  1. 1. 西南科技大学 a. 计算机科学与技术学院;b. 四川省军民融合研究院, 四川 绵阳 621010;
    2. 四川省计算机研究院, 成都 610041
  • 收稿日期:2019-06-25 修回日期:2019-08-19 发布日期:2020-08-10
  • 作者简介:王欣夷(1996-),女,硕士研究生,主研方向为计算机系统结构;王耀彬(通信作者),教授、博士;李凌,讲师、博士;杨洋,工程师、硕士;卜得庆,硕士研究生;刘志勤,教授。
  • 基金项目:
    国家自然科学基金(61672438);国家留学基金委项目(CSC201908510040);四川省科技计划项目(2019YJ0326);四川省教育厅研究项目(18ZB0603);西南科技大学科研项目(18lzx451,17lzx621);西南科技大学研究生创新基金(19ycx0051)。

Parallelism Analysis of Subroutine-Level Speculative in HPEC

WANG Xinyi1a,1b, WANG Yaobin1a,1b, LI Ling1a,1b, YANG Yang2, BU Deqing1a,1b, LIU Zhiqin1a,1b   

  1. 1a. School of Computer Science and Technology;1b. Sichuan Civil-Military Integration Institute, Southwest University of Science and Technology, Mianyang, Sichuan 621010, China;
    2. Sichuan Institute of Computer Sciences, Chengdu 610041, China
  • Received:2019-06-25 Revised:2019-08-19 Published:2020-08-10

摘要: 线程级推测(TLS)技术的有效运用可提高多核芯片的硬件资源利用率,其已在多种串行应用的自动并行化工作中取得了较好效果,但目前缺乏对HPEC应用子程序级线程推测方面的有效分析。针对该问题,设计子程序级推测的剖析机制及核心数据结构,选取HPEC中7个具有代表性的程序,挖掘其子程序级的最大潜在并行性,并结合线程粒度、并行覆盖率、子程序调用次数、数据依赖及源码,对程序的加速比进行分析。实验结果表明,fdfir、svd、db和ga程序的加速比在2.23~11.31,tdfir程序的加速效果最好,加速比达到221.78,对于包含多次非重度数据依赖子程序调用的应用,更适合采用子程序级TLS技术测试其并行性。

关键词: 线程级推测, 多核芯片, HPEC基准套件, 数据依赖, 动态剖析

Abstract: Effective application of Thread-Level Speculation(TLS) technology can improve the hardware resource utilization of multicore chips,and has acquired successful results in automatic parallelization of multiple serial applications.However,it lacks efficient analysis of subroutine-level thread speculation of HPEC applications.To address the problem,this paper designs an analysis mechanism for subroutine-level speculation and its core data structure.Then seven representative programs in HPEC are selected,and their maximum potential parallelism at the subroutine level is excavated.On this basis,the acceleration ratio of the programs is analyzed by combining the granularity of threads,coverage rate of parallelism,number of calls of subroutines,data dependency and source code.Analysis results show that the acceleration ratio of fdfir,svd,db and ga programs range from 2.23 to 11.31.The tdfir program works best for acceleration with the acceleration ratio reaching 221.78.For applications that include multiple calls of subroutines non-heavy data dependency,it is more suitable to adopt subroutine-level TLS technology for parallelism testing.

Key words: Thread-Level Speculation(TLS), multicore chips, HPEC benchmark suite, data dependency, dynamic profiling

中图分类号: