基于作业路径的存储系统作业感知条带优化方法

doi:10.19678/j.issn.1000-3428.0068671

摘要/Abstract

摘要：

为解决超级计算机I/O效率不高、用户无法充分利用存储系统I/O能力的问题, 研究生产型超级计算机对象存储目标(OST)上的工作负载, 分析高性能计算作业的I/O模式和整个系统中作业的I/O分布情况, 探索提升I/O效率的方法。研究结果显示: 在传统非条带化设置下, OST上的瞬时负载严重不平衡, 导致无法调用未充分利用的OST进行I/O请求; 不同作业的I/O模式对OST的使用情况也有所不同, 因此可以根据作业的I/O模式适当调整文件条带布局, 调动未充分利用的OST来提升I/O性能。提出一种作业感知条带优化方法, 包括静态和动态文件条带化。静态文件条带化将用户的作业均进行条带优化, 而动态文件条带化利用作业名和作业路径的聚类方式提取作业之间的相似性, 预测用户部分可条带优化的作业, 并在作业完成后还原条带配置以减小条带设置错误的负面影响。实验结果表明, 作业感知文件条带化能够增加作业使用的OST数量, 有效提升作业的并行I/O性能, 同时不会对系统稳定性产生显著影响。

关键词: 作业感知, 文件条带化, 高性能计算, 并行I/O, 存储系统

Abstract:

This study aims to address the issue of low I/O efficiency in supercomputers to enable users to fully utilize the I/O capabilities of the storage system. To achieve this, the study focuses on the workload of a production supercomputer's Object Storage Targets (OSTs), analyzing the I/O patterns of High Performance Computing (HPC) jobs and distribution of I/O across the entire system in pursuit of an approach to enhance I/O efficiency. The research findings indicate that under traditional non-striped settings, the instantaneous loads on OSTs are severely imbalanced, making it challenging to utilize underutilized OSTs for I/O requests. Different job I/O patterns also affect the OST usage in different ways. Hence, adjusting the file striping layout according to the job I/O pattern can harness underutilized OSTs and improve I/O performance. To address this issue, a job-aware striping optimization approach is proposed that encompasses both static and dynamic file striping. Static file striping optimizes all user jobs, and dynamic file striping identifies job similarities based on job names and paths, thereby predicting partially stripe-optimized jobs. After job completion, the striping configuration is restored to mitigate the negative impact of striping configuration errors. The experimental results demonstrate that job-aware file striping increases the number of OSTs utilized by jobs, effectively improving the parallel I/O performance without significantly affecting system stability.

Key words: job awareness, file striping, High Performance Computing (HPC), parallel I/O, storage system

鲜港, 杨文祥, 张晓蓉, 喻杰, 田永强. 基于作业路径的存储系统作业感知条带优化方法[J]. 计算机工程, 2025, 51(3): 34-44.

XIAN Gang, YANG Wenxiang, ZHANG Xiaorong, YU Jie, TIAN Yongqiang. Job-aware Striping Optimization Approach via Job Paths in Storage Systems[J]. Computer Engineering, 2025, 51(3): 34-44.

https://www.ecice06.com/CN/Y2025/V51/I3/34

图/表 14

图1 OST之间的总I/O流量分布

Fig.1 The total I/O traffic distribution among OSTs

图2 读/写I/O流量构成分布

Fig.2 Composition distribution of read/write I/O traffic

图3 前8个最繁忙的OST的瞬时工作负载的KDE图

Fig.3 KDE diagram of the instantaneous workload of the top 8 busiest OSTs

图4 文件条带化过程

Fig.4 The process of file striping

图5 2种不同的作业I/O模式

Fig.5 Two distinct I/O patterns of jobs

图6 作业在不同OST比例区间的频率分布

Fig.6 The frequency distribution of jobs in various OST proportion intervals

图7 作业在不同写I/O流量间隔内的频率分布

Fig.7 The frequency distribution of jobs in various write I/O traffic intervals

图8 3种I/O模式类别下的作业频率分布

Fig.8 The frequency distribution under three I/O pattern types

图9 作业感知条带优化框架

Fig.9 Job-aware striping optimization framework

图10 可优化用户的作业构成分布

Fig.10 Composition distribution of jobs for users that can be optimized

图11 IOR性能测试结果

Fig.11 Result of performance test with IOR

图12 作业感知文件条带化前后前10%OST的负载变化

Fig.12 Load changes of top 10% OSTs before and after job-aware file striping

参考文献 26

1	DONGARRA J. Top500-november 2022[EB/OL]. [2023-06-15]. https://top500.org/lists/top500/list/2022/11.
2	何晓斌, 高洁, 肖伟, 等. 应用透明的超算多层存储加速技术研究. 计算机工程, 2022, 48 (12): 1- 8. doi: 10.19678/j.issn.1000-3428.0065928
	HE X B , GAO J , XIAO W , et al. Research on application-transparent supercomputing multi-tier storage acceleration technology. Computer Engineering, 2022, 48 (12): 1- 8. doi: 10.19678/j.issn.1000-3428.0065928
3	PUMMA S , SI M , FENG W C , et al. Scalable deep learning via I/O analysis and optimization. ACM Transactions on Parallel Computing, 2019, 6 (2): 1- 34.
4	PATEL T, BYNA S, LOCKWOOD G K, et al. Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA: ACM Press, 2019: 1-13.
5	ISAKOV M, del ROSARIO E, MADIREDDY S, et al. HPC I/O throughput bottleneck analysis with explainable local models[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Washington D. C., USA: IEEE Press, 2020: 1-10.
6	PATEL T, BYNA S. Uncovering access, reuse, and sharing characteristics of I/O-intensive files on large-scale production HPC systems[C]//Proceedings of the 18th USENIX Conference on File and Storage Technologies. [S. l. ]: USENIX, 2020: 91-101.
7	WANG F Y, SIM H, HARR C, et al. Diving into petascale production file systems through large scale profiling and analysis[C]//Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems. New York, USA: ACM Press, 2017: 37-42.
8	DAI Y Q, DONG Y, LU K, et al. Towards scalable resource management for supercomputers[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Washington D. C., USA: IEEE Press, 2022: 324-338.
9	GARLICK J E. I/O forwarding on livermore computing commodity Linux clusters: LLNL-TR-609233[R]. Livermore, USA: Lawrence Livermore National Lab, 2012: 1-9.
10	PAUL A K, FAALAND O, MOODY A, et al. Understanding HPC application I/O behavior using system level statistics[C]//Proceedings of the IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC). Washington D. C., USA: IEEE Press, 2020: 202-211.
11	周隆放, 杨文祥, 韩永国, 等. 作业名层次化聚类算法预测作业运行时间. 国防科技大学学报, 2022, 44 (5): 13- 23. doi: 10.11887/j.cn.202205002
	ZHOU L F , YANG W X , HAN Y G , et al. Predicting the job running time with job name hierarchical clustering algorithm. Journal of National University of Defense Technology, 2022, 44 (5): 13- 23. doi: 10.11887/j.cn.202205002
12	XIAN G , ZHANG X R , YU J , et al. PreF: predicting job failure on supercomputers with job path and user behavior. Concurrency and Computation: Practice and Experience, 2022, 34 (23) doi: 10.1002/cpe.7202
13	唐阳坤, 鲜港, 杨文祥, 等. 基于用户行为的超级计算机作业失败预测方法. 计算机工程与科学, 2022, 44 (10): 1753- 1761. URL
	TANG Y K , XIAN G , YANG W X , et al. Job failure prediction based on user behavior on supercomputers. Computer Engineering & Science, 2022, 44 (10): 1753- 1761. URL
14	ZHANG H T , XIAN G , YANG W X , et al. A study of job failure prediction on supercomputers with application semantic enhancement. Journal of Computing Science and Engineering, 2022, 16 (4): 222- 232.
15	IOR HPC benchmark[EB/OL]. [2023-06-15]. https://github.com/LLNL/ior.
16	LOCKWOOD G K, YOO W, BYNA S, et al. UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis[C]//Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems. New York, USA: ACM Press, 2017: 55-60.
17	KUNKEL J M, BETKE E, BRYSON M, et al. Tools for analyzing parallel I/O[C]//Proceedings of 2018 International Workshops on High Performance Computing. Berlin, Germany: Springer, 2018: 49-70.
18	LOCKWOOD G K, WRIGHT N, SNYDER S, et al. TOKIO on ClusterStor: connecting standard tools to enable holistic I/O performance analysis[EB/OL]. [2023-06-15]. https://www.osti.gov/biblio/1632125.
19	PARK B H, HUKERIKAR S, ADAMSON R, et al. Big data meets HPC log analytics: scalable approach to understanding systems at extreme scale[C]//Proceedings of the IEEE International Conference on Cluster Computing. Washington D. C., USA: IEEE Press, 2017: 758-765.
20	NEUWIRTH S, PAUL A K. Parallel I/O evaluation techniques and emerging HPC workloads: a perspective[C]//Proceedings of the IEEE International Conference on Cluster Computing. Washington D. C., USA: IEEE Press, 2021: 671-679.
21	LIU Z C, LEWIS R, KETTIMUTHU R, et al. Characterization and identification of HPC applications at leadership computing facility[C]//Proceedings of the 34th ACM International Conference on Supercomputing. New York, USA: ACM Press, 2020: 1-12.
22	LU S, LUO B, PATEL T, et al. Making disk failure predictions SMARTer![C]//Proceedings of the 18th USENIX Conference on File and Storage Technologies. [S. l. ]: USENIX, 2020: 151-168.
23	CHIEN S W D, PODOBAS A, PENG I B, et al. tf-Darshan: understanding fine-grained I/O performance in machine learning workloads[C]//Proceedings of the IEEE International Conference on Cluster Computing. Washington D. C., USA: IEEE Press, 2020: 359-370.
24	MADIREDDY S, BALAPRAKASH P, CARNS P, et al. Analysis and correlation of application I/O performance and system-wide I/O activity[C]//Proceedings of the International Conference on Networking, Architecture, and Storage. Washington D. C., USA: IEEE Press, 2017: 1-10.
25	KIM S, SUNG D K, SON Y. IFLustre: towards interference-free and efficient storage allocation in distributed file system[C]//Proceedings of the 30th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Washington D. C., USA: IEEE Press, 2022: 105-112.
26	YANG B, ZOU Y L, LIU W G, et al. An end-to-end and adaptive I/O optimization tool for modern HPC storage systems[C]//Proceedings of the IEEE International Parallel and Distributed Processing Symposium. Washington D. C., USA: IEEE Press, 2022: 1294-1304.

[1]	黄斌, 柳安军, 潘景山, 田敏, 张煜, 朱光慧. 基于GPU的LBM迁移模块算法优化[J]. 计算机工程, 2024, 50(2): 232-238.
[2]	方燕飞, 刘齐, 董恩铭, 李雁冰, 过锋, 王谛, 何王全, 漆锋滨. 面向E级超算系统的众核片上存储层次研究[J]. 计算机工程, 2023, 49(12): 10-24.
[3]	刘康, 万伟, 刘波, 李俊宏, 李柱. 基于“嵩山”超级计算机的UCX库分析与优化[J]. 计算机工程, 2023, 49(12): 274-281.
[4]	建澜涛, 任秀江, 张祯, 石嵩, 黄益明, 张春林. E级高性能计算机的维护故障诊断系统研究[J]. 计算机工程, 2022, 48(12): 24-37.
[5]	刘博阳, 胡舒凯, 施得君, 卢宏生. VTFTR：高维胖树中的无死锁容错路由算法[J]. 计算机工程, 2022, 48(12): 38-44,53.
[6]	郭威, 谢光伟, 张帆, 李敏. 一种分布式存储系统拟态化架构设计与实现[J]. 计算机工程, 2020, 46(6): 12-19.
[7]	孙震宇, 石京燕, 孙功星, 杜然, 姜晓巍, 邹佳恒, 谭宏楠. 大规模异构计算集群的双层作业调度系统[J]. 计算机工程, 2020, 46(1): 187-195.
[8]	孙黎, 苏宇, 张弛, 张涛. 分布式存储系统中的纠删码容错方法研究[J]. 计算机工程, 2019, 45(11): 74-80.
[9]	魏渐俊,陈良育. 基于GPGPU的大整数矩阵行列式快速准确计算方法[J]. 计算机工程, 2018, 44(3): 47-54.
[10]	陈曦,朱建涛,何晓斌. 一种面向高性能计算的分布式对象存储系统[J]. 计算机工程, 2017, 43(8): 69-73.
[11]	吴修国. 云存储系统中基于动态规划的最小开销数据副本布局研究[J]. 计算机工程, 2017, 43(7): 29-37.
[12]	宋庆增,吕华阳,赵雷,王江峰. Xeon Phi协处理器的功耗特征测量与分析[J]. 计算机工程, 2017, 43(6): 313-321.
[13]	刘嵩,刘轶,杨海龙,周彧聪. 基于RAPL的机群系统功耗限额控制[J]. 计算机工程, 2017, 43(5): 40-46.
[14]	张青,王珂,张春艳,张强. 基于高性能并行计算的旋转网球空气动力学模拟[J]. 计算机工程, 2017, 43(12): 45-50.
[15]	余成龙,王永文. SIMD非对齐访存结构设计与实现[J]. 计算机工程, 2016, 42(9): 1-4.

选择文件类型/文献管理软件名称

选择包含的内容