作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (10): 362-369. doi: 10.19678/j.issn.1000-3428.0068180

• 开发研究与工程应用 • 上一篇    下一篇

一种用于软件预取的访存轨迹采样算法

刘大兴, 顾乃杰*(), 黄章进, 苏俊杰, 齐东升   

  1. 中国科学技术大学计算机科学与技术学院, 安徽 合肥 230027
  • 收稿日期:2023-08-01 出版日期:2024-10-15 发布日期:2024-03-06
  • 通讯作者: 顾乃杰
  • 基金资助:
    国家自然科学基金(U20A20229)

A Sampling Algorithm for Software Prefetching Using Memory Access Traces

LIU Daxing, GU Naijie*(), HUANG Zhangjin, SU Junjie, QI Dongsheng   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, Anhui, China
  • Received:2023-08-01 Online:2024-10-15 Published:2024-03-06
  • Contact: GU Naijie

摘要:

软件预取作为提升数据存取性能的一种重要技术, 得到了广泛的关注和应用。在软件预取的研究中, 往往需要使用访存轨迹分析结合采样算法来筛选出存在缓存未命中的访存指令作为预取目标。然而, 传统的迸发采样算法无法区分不同类型的轨迹信息, 且容易遗漏访问次数较少的指令。针对以上问题, 提出一种基于单遍聚类和分层采样的访存轨迹采样算法。首先提取访存轨迹信息特征; 然后利用单遍聚类方法并依据特征相似程度进行访存信息聚类; 最后以聚类为基础进行分层采样, 根据缓存未命中率对轨迹中不同的部分合理分配注意力来调整采样比, 有效缓解了规模较小类别的采样遗漏情况。实验结果显示, 在选择的8个测试程序上, 相比于传统迸发采样算法, 所提算法可平均多覆盖15.70%的缓存未命中指令, 基于所提算法的预取平均可额外减少20.76%的缓存未命中数和3.51%的程序运行时间。

关键词: 分层采样, 访存轨迹, 软件预取, 迸发采样, 单遍聚类

Abstract:

Software prefetching, an important technology for improving data access performance, has received widespread attention and has been used in several applications. Software prefetching often necessitates memory access trace analysis combined with a sampling algorithm to filter memory access instructions with cache misses as prefetching targets. However, traditional burst sampling algorithms cannot distinguish between different types of trace information and are prone to missing instructions with fewer accesses. To address these issues, a new sampling algorithm for memory access traces is proposed based on single-pass clustering and stratified sampling. First, the features of the memory trace information are extracted. Subsequently, a single-pass clustering method is used to cluster the memory information according to the similarity of features. Finally, stratified sampling is performed based on clustering, and the sampling ratio is adjusted according to the cache miss rate, which effectively mitigates the sampling omission of smaller categories. The experimental results show that the proposed algorithm covers an average of 15.70% more cache misses in the selected eight test programs than the traditional burst sampling algorithm. Prefetching based on the proposed sampling algorithm reduces the number of cache misses by 20.76% and program runtime by 3.51%.

Key words: stratified sampling, memory access trace, software prefetching, burst sampling, single-pass clustering