作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (3): 287-298. doi: 10.19678/j.issn.1000-3428.0070280

• 体系结构与先进计算 • 上一篇    下一篇

基于Q-Learning长尾延迟优化的SSD-SMR写缓存策略研究

刘健1, 章步镐2, 方匡弛1, 刘宣锋1, 孙国道1, 梁荣华1, 梁浩然1,*()   

  1. 1. 浙江工业大学计算机科学与技术学院, 浙江 杭州 310023
    2. 银江技术股份有限公司, 浙江 杭州 311400
  • 收稿日期:2024-08-22 修回日期:2024-10-29 出版日期:2026-03-15 发布日期:2024-12-23
  • 通讯作者: 梁浩然
  • 作者简介:

    刘健(CCF会员),男,讲师、博士,主研方向为时序数据库、存储系统

    章步镐,副高级工程师

    方匡弛,硕士

    刘宣锋,硕士

    孙国道,副教授、博士

    梁荣华,教授、博士

    梁浩然(通信作者),副研究员、博士

  • 基金资助:
    国家自然科学基金区域创新发展联合基金重点项目(U24A20247); 国家自然科学基金(62202430); 国家自然科学基金(62176235); 国家自然科学基金(62432014); 浙江省自然科学基金(LY24F020018); 浙江省自然科学基金(LR23F020003); 浙江省智能交通工程技术研究中心开放课题(2023ERCITZJ-KF02)

SSD-SMR Write Cache Strategy for Optimizing Long-Tail Latency Based on Q-Learning

LIU Jian1, ZHANG Buhao2, FANG Kuangchi1, LIU Xuanfeng1, SUN Guodao1, LIANG Ronghua1, LIANG Haoran1,*()   

  1. 1. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, Zhejiang, China
    2. Enjoyor Technology Co., Ltd., Hangzhou 311400, Zhejiang, China
  • Received:2024-08-22 Revised:2024-10-29 Online:2026-03-15 Published:2024-12-23
  • Contact: LIANG Haoran

摘要:

随着全球数据规模的不断增大, 如何以低成本的方式有效提升数据的访问性能是存储系统面临的一项重要挑战, 使用低延迟、高带宽的固态硬盘(SSD)和低成本、高存储密度的叠瓦式磁盘(SMR)来构建缓存系统, 成为一种有效的解决方案。但是, SMR固有的机械运动和多磁道堆叠的特性导致其写性能较差, SSD中的脏数据频繁写回SMR所导致的大量读-合并-写(RMW)操作可能会引起严重的长尾延迟现象。为此, 基于SSD-SMR混合存储架构提出一种结合强化学习Q-Learning算法的缓存替换优化策略。通过学习SMR设备的I/O负载状况与延迟之间的经验知识来控制对SMR的写入, 当SMR负载较大时, 通过控制缓存中脏数据的逐出来减少SMR因写回而产生的大量RMW操作, 从而优化系统在不同负载下的尾部延迟开销。将Q-Learning算法与基于数据流行度的缓存算法LRU以及SMR感知的缓存算法SAC进行结合, 使用真实企业Trace和YCSB生成的模拟Trace进行测试, 实验结果表明, 所提方法能够有效提升现有缓存算法的性能, 可以降低57.06%的平均延迟和87.49%的尾部延迟。

关键词: Q-Learning算法, I/O负载, 长尾延迟, 缓存替换算法, 混合存储

Abstract:

With the continuous increase in the scale of global data, the effective and inexpensive improvement of data access performance is an important challenge faced by storage systems. An effective solution is to build cache systems using low-latency, high-bandwidth Solid-State Drives (SSD) and low-cost, high-storage-density Shingled Magnetic Recording (SMR). However, the inherent mechanical motion and multitrack stacking characteristics of SMR result in poor write performance, and the frequent write-back of dirty data in SSD to SMR may cause severe long-tail latency owing to the large number of Read-Merge-Write (RMW) operations. To this end, a cache replacement optimization strategy combining a reinforcement learning Q-Learning algorithm is proposed based on the SSD-SMR hybrid storage architecture. By learning the empirical relationship between the I/O load status and the latency of the SMR devices, write operations to the SMR can be controlled. When the SMR load is high, controlling the eviction of dirty data in the cache can reduce the number of RMW operations caused by SMR write-backs, thereby optimizing the tail latency overhead of the system under different loads. The Q-Learning algorithm is combined with the data-popularity-based caching algorithm LRU and the SMR aware caching algorithm SAC and tested using real enterprise Trace and simulated Trace generated by YCSB. The experimental results show that the proposed method can effectively improve the performance of existing caching algorithms, reducing the average latency by 57.06% and tail latency by 87.49%.

Key words: Q-Learning algorithm, I/O load, long-tail latency, cache replacement algorithm, hybrid storage