A Survey on Optimizing Log-Structured Merge-tree Based on Computational Storage Technology

doi:10.19678/j.issn.1000-3428.0252595

Abstract

Abstract: The Log-Structured Merge tree (LSM-tree) has been widely adopted in key-value storage systems due to its high write performance enabled by sequential write operations. However, it also suffers from issues such as high read/write amplification, significant compaction overhead, and data redundancy. Traditional optimization approaches aim to improve system performance by modifying tree structures, refining compaction strategies, and adopting key-value separation mechanisms. In the era of big data, the rapid growth of data volume leads to increasingly frequent write and compaction operations in LSM-tree systems, placing continuous pressure on CPU computing resources and gradually turning them into performance bottlenecks. Moreover, traditional solutions fail to fundamentally avoid the substantial I/O traffic between the host and storage devices, resulting in high overhead due to redundant data movement. Computational storage technology offers a promising solution to these challenges. By integrating computing resources at the storage layer, it enables task offloading to alleviate the CPU's workload and supports near-data processing to reduce the performance overhead caused by data migration. This survey focuses on optimization strategies for LSM-tree based on computational storage. First, the architecture of computational storage is reviewed. Then, in response to the major bottlenecks under the big data context, existing solutions are classified and compared from two perspectives: compaction optimization and data migration optimization. Finally, potential future research directions are suggested to provide insights in this field.

摘要： 日志结构合并树（Log-Structured Merge tree，LSM-tree）被广泛用于键值存储系统，凭借顺序写入机制实现高效的写入性能，但同时也带来了读写放大率高、合并任务开销大及数据冗余等问题。传统优化方案通过调整树结构、优化合并策略以及采用键值分离机制等方式提升系统性能。然而，在大数据时代，数据规模急剧飙升，LSM-tree 需要处理更频繁的写入与合并任务，导致 CPU 计算资源持续紧张，逐渐成为系统性能提升的瓶颈。此外，传统优化方案未能避免主机与存储设备间大量的I/O操作，仍面临高昂的冗余数据迁移开销。计算存储技术为应对上述挑战带来了新思路。该技术在存储层部署额外算力资源，通过任务卸载减轻CPU负担，或进一步通过近数据处理降低数据迁移带来的性能损耗。本文聚焦于基于计算存储技术的LSM-tree优化研究。首先，对计算存储技术架构进行梳理。然后，针对大数据背景下系统面临的主要瓶颈，从合并任务优化与数据迁移优化两个方面对现有方案进行分类介绍和对比讨论。最后，结合当前研究的局限性与发展趋势，对未来的研究方向进行了展望。

LIU Ying, ZHANG Runyu , YANG Chaoshu. A Survey on Optimizing Log-Structured Merge-tree Based on Computational Storage Technology[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252595.

刘颖, 张润宇, 杨朝树. 面向计算存储技术的日志结构合并树优化研究综述[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252595.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252595

References

[1]International Data Corporation. Worldwide IDC Global DataSphere Forecast, 2024-2028: AI Everywhere, But Upsurge in Data Will Take Time (Doc#US52076424), 2024-05. [2]Sun H, Lou B, Zhao C, et al. Asynchronous Compaction Acceleration Scheme for Near-data Processing-enabled LSM-tree-based KV Stores[J]. ACM Transactions on Embedded Computing Systems, 2024, 23(6): 1-33. [3]Google. LeveIDB. Available: https://github.com/google/leveldb/ (2021). [4]Facebook. RocksDB. Available: https://github.com/facebook/rocksdb (2024). [5]Huang G, Cheng X, Wang J, et al. X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing[C] //Proceedings of the 2019 International Conference on Management of Data. New York, NY: ACM, 2019: 651-665. [6]Amur H, Andersen D G, Kaminsky M, et al. Design of a Write-Optimized Data Store[J]// https://repository.gatech.edu/server/api/core/bitstreams/79429d60-38b7-4885-9a88-c8211c7acba8/content [7]Yao T, Wan J, Huang P, et al. Building Efficient Key-Value Stores via a Lightweight Compaction Tree[J]. ACM Transactions on Storage, 2017, 13(4): 1-28. [8]Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, et al. PebblesDB: Building key-value Stores using Fragmented Log-Structured Merge Trees[C]// Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17), New York, NY: ACM, 2017: 497-514. [9]Oana Balmau, Diego Didona, Rachid Guerraoui, et al. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores[C]//Proceedings of the 2017 USENIX Annual Technical Conference (ATC'17), Berkeley, CA: USENIX Association, 2017: 363-375. [10]Fei Mei, Qiang Cao, Hong Jiang, Jingjun Li. SifrDB: A Unified Solution for Write-Optimized Key-Value Stores in Large Datacenter[C]//Proceedings of the ACM Symposium on Cloud Computing (SoCC'18). New York, NY: ACM, 2018: 477-489. [11]Yunpeng Chai, Yanfeng Chai, Xin Wang, et al. LDC: A Lower-level Driven Compaction Method to Optimize SSD-Oriented Key-Value Stores[C]//Proceedings of 2019 IEEE the 35th International Conference on Data Engineering (ICDE'19). Piscataway, NJ: IEEE, 2019: 722-733. [12]Oana Balmau, Florin Dinu, Willy Zwaenepoel, et al. SILK. Preventing Latency Spikes in Log-Structured Merge Key-Value Stores[C]//Proceedings of 2019 USENIX Annual Technical Conference (ATC'19). Berkeley, CA: USENIX Association, 2019: 753-766. [13]Qiang Zhang, Yongkun Li, Patrick PC Lee, et al. UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads Via Unified indexing[C]//Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE'20). Piscataway, NJ: IEEE, 2020: 313-324. [14]Yifan Dai, Yien Xu, Aishwarya Ganesan, et al. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees[C]//Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20). Berkeley, CA: USENIX Association, 2020:155-171. [15]Fei Li, Youyou Lu, Zhe Yang, et al. SineKV: Decoupled Secondary Indexing for LSM-based Key-Value Stores[C]//Proceedings of the IEEE 40th International Conference on Distributed Computing Systems (ICDCS'20). Piscataway, NJ: IEEE, 2020: 1112-1122. [16]Jianshun Zhang, Fang Wang, Sheng Qiu, et al. Scavenger: Better Space-Time Trade-Offs for Key-Value Separated LSM-trees[C]//Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE'24). Piscataway, NJ: IEEE, 2024:4072-4085. [17]Chen Shen, Youyou Lu, Fei Li, et al. NovKV: Efficient Garbage Collection for Key-Value Separated LSM-Stores[C]//Proceedings of the 36th Symposium on Mass Storage Systems and Technologies (MSST'20). Piscataway, NJ: IEEE, 2020: 1-8. [18]Shetty P, Spillane R, Malpani R, et al. Building Workload-Independent Storage with VT-Trees[C]//Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST ’13). Berkeley, CA: USENIX Association, 2013:17-30. [19]Sears R, Ramakrishnan R. bLSM: A General Purpose Log Structured Merge Tree[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York, NY: ACM, 2012: 217-228. [20]Feng-Feng Pan, Yin-Liang Yue, Jin Xiong. dCompaction: Speeding up Compaction of the LSM-tree via Delayed Compaction[J]. Journal of Computer Science and Technology, 2017, 32(1): 41-54. [21]Yue Y, He B, Li Y, et al. Building an Efficient Put-Intensive Key-Value Store with Skip-Tree[J]. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 961-973. [22]Lu L, Pillai T S, Gopalakrishnan H, et al. WiscKey: Separating Keys from Values in SSD-Conscious Storage[J]. ACM Transactions on Storage, 2017, 13(1): 1-28. [23]Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, et al. HashKV: Enabling Efficient Updates in KV Storage via Hashing[C]// Proceedings of the USENIX Annual Technical Conference (ATC'18). Berkeley, CA: USENIX Association, 2018: 1007-1019. [24]Chenlei Tang, Jiguang Wan, Changsheng Xie. FenceKV: Enabling Efficient Range Query for Key-Value Separation[J]. IEEE Transactions on Parallel and Distributed Systems. 2022, 33(12): 3375-3386. [25]Giorgos Xanthakis, Giorgos Saloustros, Nikos Batsaras, et al. Parallax: Hybrid Key-Value Placement in Lsm-based Key-Value Stores[C]//Proceedings of the ACM Symposium on Cloud Computing (SoCC'21). New York, NY: ACM, 2021:305-318. [26]Hao Chen, Chaoyi Ruan, Cheng Li, et al. SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage[C]//Proceedings of the 19th USENIX Conference on File and Storage Technologies(FAST'21). Berkeley, CA: USENIX Association, 2021:17-32. [27]Anastasios Papagiannis, Giorgos Saloustros, Giorgos Xanthakis, et al. Kreon: An Efficient Memory-Mapped Key-Value Store for Flash Storage[J]. ACM Transactions on Storage, 2021, 17(1): 7:1-7:32. [28]Zhuohui Duan, Jiabo Yao, Haikun Liu, Xiaofei Liao, et al. Revisiting Log-structured Merging for KV Stores in Hybrid Memory Systems[C]// Proceedings of the 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’23). New York, NY: ACM, 2023:674-687. [29]Dong Huang, Dan Feng, Qiankun Liu, et al. SplitZNS: Towards an Efficient LSM-tree on Zoned Namespace SSDs[J]. ACM Transactions on Architecture and Code Optimization, 2023, 20(3): 45:1-45:26. [30] Jongsung Lee, Dong Uk Kim, Jae W. Lee. WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSD[J]. Proceedings of the VLDB Endowment. 2023, 16(11): 2884-2896. [31] Linbo Long, Shuiyong He, Jingcheng Shen, et al. WA-Zone: Wear-Aware Zone Management Optimization for LSM-tree on ZNS SSDs[J]. ACM Transactions on Architecture and Code Optimization, 2024, 21(1): 16:1-16:23. [32]Renping Liu, Junhua Chen, Peng Chen, et al. Hi-ZNS: High Space Efficiency and Zero-Copy LSM-tree Based Stores on ZNS SSDs[C]//Proceedings of the 53rd International Conference on Parallel Processing (ICPP'24). New York, NY: ACM, 2024: 1217-1226. [33]Jingcheng Shen, Lang Yang, Linbo Long, et al. Overlapping Aware Zone Allocation for LSM Tree-Based Store on ZNS SSDs[C]// Proceedings of the 29th Asia and South Pacific Design Automation Conference (ASPDAC'24). Piscataway, NJ: IEEE, 2024: 448-453. [34]Lu A, Narendra Agrawal J, Fang Z. SQL2FPGA: Automated Acceleration of SQL Query Processing on Modern CPU-FPGA Platforms[J]. ACM Transactions on Reconfigurable Technology and Systems, 2024, 17(3): 1-28. [35]Soltaniyeh M, Lagrange Moutinho Dos Reis V, Bryson M, et al. Near-Storage Processing for Solid State Drive Based Recommendation Inference with SmartSSDs®[C]//Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering. New York, NY: ACM, 2022: 177-186. [36]Ke L, Gupta U, Cho B Y, et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing[C]//2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2020: 790-803. [37]Li S, Wang Y, Hanson E, et al. NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models[J]. IEEE Transactions on Computers, 2024, 73(5): 1248-1261. [38]Zheng Y, Fixelle J, Challapalle N, et al. ISKEVA: In-SSD Key-Value Database Engine for Video Analytics Applications[C] //Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. New York, NY: ACM, 2022: 50-60. [39]Dann J, Ritter D, Fröning H. Non-relational Databases on FPGAs: Survey, Design Decisions, Challenges[J]. ACM Computing Surveys, 2023, 55(11): 1-37. [40] The storage networking industry association, 2023, SNIA. URL https://www.snia.org/. (Accessed 8 May 2023). [41]Sun X, Xue C J, Yu J, et al. Accelerating Data Filtering for Database using FPGA[J]. Journal of Systems Architecture, 2021, 114: 101908. [42]刘忠沛,吕高锋,王继昌等. 专用数据处理器综述[J]. 计算机工程与科学, 2023, 45(2): 215-227. LIU Zhongpei, LÜ Gaofeng, WANG Jichang, et al. Review on Data Processing Unit[J]. 2023, 45(2): 215-227. [43] 蓝龙英,宋程霖,左石凯等．存算一体技术研究进展与挑战[J/OL].半导体技术. https://link.cnki.net/urlid/13.1109.TN.20250710.1626.002 Lan Longying, Song Chenglin, Zuo Shikai, et al. Research Progress and Challenge of Compute-in-Memory Technology[J/OL]. Semiconductor Technology.https://link.cnki.net/urlid/13.1109.TN.20250710.1626.002 [44]Woods L, István Z, Alonso G. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading[J]. Proceedings of the VLDB Endowment, 2014, 7(11): 963-974. [45]Joo Hwan Lee, Hui Zhang, Veronica Lagrange, et al. SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD[J]. IEEE Computer Architecture Letters.2020, 19(2): 114-117. [46]Jaewook Kwak, Sangjin Lee, Kibin Park, et al. Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems[J]. ACM Transactions on Storage, 2020, 16(3): 15:1-15:35. [47]Teng Zhang, Jianying Wang, Xuntao Cheng, et al. FPGA-Accelerated Compactions for LSM-based Key-Value Store[C]//18th USENIX Conference on File and Storage Technologies (FAST'20). Berkeley, CA: USENIX Association, 2020: 225-237. [48] Sun X, Yu J, Zhou Z, et al. FPGA-based Compaction Engine for Accelerating LSM-tree Key-Value Stores[C]//2020 IEEE 36th International Conference on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2020: 1261-1272. [49]Tang D, Wang W, Mao Y, et al. STEM: Streaming-Based FPGA Acceleration for Large-Scale Compactions in LSM KV[C]//2024 IEEE 40th International Conference on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2024: 3893-3905. [50]Peng Xu, Jiguang Wan, Ping Huang, et al. LUDA: Boost LSM Key Value Store Compactions with GPUs. arXiv: 2004.03054. https://arxiv.org/abs/2004.03054. [51]Choi W G, Kim D, Roh H, et al. OurRocks: Offloading Disk Scan Directly to GPU in Write-Optimized Database System[J]. IEEE Transactions on Computers, 2021, 70(11): 1831-1844. [52]Sun H, Xu J, Jiang X, et al. gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value Stores[J]. ACM Transactions on Storage, 2024, 20(1): 1-41. [53]Chen J, Wang S, Zhang Z, et al. iKnowFirst: An Efficient DPU-Assisted Compaction for LSM-Tree-Based Key-Value Stores[C].//2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP). Piscataway, NJ: IEEE,2023: 53-60. [54]李迦雳, 刘铎, 陈咸彰, 等. 基于闪存存储的近数据处理技术综述 [J]. 集成技术, 2022, 11(3): 23-41. Li JL, Liu D, Chen XZ, et al. A Survey of Flash Memory based Near-data Processing Technology [J]. Journal of Integration Technology, 2022, 11(3): 23-41. [55]谢洋,李晨,陈小文. 面向数据密集型应用的近数据处理架构设计[J]. 计算机工程与科学，2025,47(5:)797-810. XIE Yang, LI Chen, CHEN Xiaowen. A near-data processing architecture for data-intensive applications[J]. 025,47(5:)797-810. [56]Fakhry D, Abdelsalam M, El-Kharashi M W, et al. A review on computational storage devices and near memory computing for high performance applications[J]. Memories - Materials, Devices, Circuits and Systems, 2023, 4: 100051. [57]Ding C, Zhou J, Wan J, et al. DComp: Efficient Offload of LSM-tree Compaction with Data Processing Units[C]. // Proceedings of the 52nd International Conference on Parallel Processing. New York, NY: ACM, 2023:233-243. [58]Ding C, Zhou J, Lu K, et al. D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage[J]. ACM Transactions on Architecture and Code Optimization, 2024, 21(3): 1-22. [59]Zhou H, Chen Y, Zeng W, et al. GPComp: Using GPU and SSD-GPU Peer to Peer DMA to Accelerate LSM-Tree Compaction for Key-Value Store[J]. IEEE Transactions on Parallel and Distributed Systems, 2025, 36(9): 1920-1936. [60]Sun H, Jiang X, Yue Y, et al. RGKV: A GPGPU-Empowered Compaction Framework for LSM-Tree-Based KV Stores With Optimized Data Transfer and Parallel Processing[J]. IEEE Transactions on Computers, 2025, 74(5): 1605-1619. [61]Gu B, Yoon A S, Bae D H, et al. Biscuit: A Framework for Near-Data Processing of Big Data Workloads[C]//2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2016: 153-165. [62]Wu S M, Lin K H, Chang L P. KVSSD: Close Integration of LSM trees and Flash Translation Layer for Write-efficient KV Store[C]//2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). Piscataway, NJ: IEEE, 2018: 563-568. [63]Im J, Bae J, Chung C, et al. Design of LSM-tree-based Key-Value SSDs with Bounded Tails[J]. ACM Transactions on Storage, 2021, 17(2): 1-27. [64]Lee S, Lee C G, Min D, et al. Iterator Interface Extended LSM-tree-based KVSSD for Range Queries[C]//16th ACM International Conference on Systems and Storage. New York, NY: ACM, 2023: 60-70. [65]Minje Lim, Jeeyoon Jung, Dongkun Shin. LSM-tree Compaction Acceleration Using In Storage Processing[C]//Proceedings of the 2021 IEEE International Conference on Consumer Electronics-Asia(ICCE-Asia'21). Piscataway, NJ: IEEE, 2021:1-3. [66]Sun H, Liu W, Huang J, et al. Collaborative Compaction Optimization System using Near-Data Processing for LSM-tree-based Key-Value Stores[J]. Journal of Parallel and Distributed Computing, 2019, 131: 29-43. [67]Sun H, Liu W, Huang J, et al. Near-Data Processing-Enabled and Time-Aware Compaction Optimization for LSM-tree-based Key-Value Stores[C]//Proceedings of the 48th International Conference on Parallel Processing. New York, NY: ACM, 2019: 1-11. [68]Hui Sun, Qiang Wang, Yinliang Yue, et al. A Storage Computing Architecture with Multiple NDP Devices for Accelerating Compaction Performance in LSM-tree based KV Stores[J]. Journal of Systems Architecture(JSA), 2022, 130: 102681. [69]Hui Sun, Bendong Lou, Chao Zhao, et al. Asynchronous Compaction Acceleration Scheme for Near-data Processing-enabled LSM-tree-based KV Stores[J]. ACM Transactions on Embedded Computing Systems. 2024, 23(6): 93:1-93:33. [70]Sun H, Zhao C, Yue Y, et al. ProckStore: An NDP-empowered Key-Value Store with Asynchronous and Multi-threaded Compaction Scheme for Optimized Performance[J]. Journal of Systems Architecture, 2025, 160: 103342. [71]Park I, Zheng Q, Manno D, et al. KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive Applications[C]//2023 IEEE International Conference on Cluster Computing (CLUSTER). Piscataway, NJ: IEEE, 2023, pp. 132-144. [72]Duan Z, Feng H, Liu H, et al. AegonKV: A High Bandwidth, Low Tail Latency, and Low Storage Cost KV-Separated LSM Store with SmartSSD-based GC Offloading[C] //Proceedings of the 23rd USENIX Conference on File and Storage Technologies. Berkeley, CA: USENIX Association, 2025:321-335.

Please choose a citation manager

Content to export