计算机工程

• •    

面向高能物理分级存储的文件访问热度预测

  

  • 发布日期:2020-03-25

  • Published:2020-03-25

摘要: 高能物理计算是典型的数据密集型计算,广泛采用基于文件的分级存储方案。一般根据访问热度不同将数据存储于 不同性能的存储设备上。目前数据热度预测采用基于人工经验的启发式算法,准确率较低。提出一种借助 LSTM 长短期神经 网络预测文件未来访问热度的方法,包括网络结构设计、训练和预测算法等。通过划分动态时间窗口,构造文件访问特征的 时序序列,预测不同数据的访问趋势。以高能物理实验 LHAASO 数据为例,与 SVM、MLP 等已有算法相比,该模型预测准 确度提升了 30%左右,具有更强的适用性。

Abstract: High-energy physics calculation is a typical data-intensive calculation, file-based hierarchical storage solutions with unified namespace have been widely adopted. Generally, data is stored on storage devices with different performances according to the access heat. At present, the data heat prediction widely adopts a heuristic algorithm based on artificial experience. The prediction accuracy is low. A method of predicting future access heat based on file access features using LSTM is proposed, including network structure design, training, and prediction algorithms. By dividing the dynamic time window, a time series of file access features is constructed, and data access trend is predicted. In this paper, the real data of high-energy physics experiment LHAASO is taken as an example. The results show the prediction accuracy is improved by about 30%, which shows stronger applicability.