Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A Survey on Optimizing Log-Structured Merge-tree Based on Computational Storage Technology

  

  • Published:2025-11-20

面向计算存储技术的日志结构合并树优化研究综述

Abstract: The Log-Structured Merge tree (LSM-tree) has been widely adopted in key-value storage systems due to its high write performance enabled by sequential write operations. However, it also suffers from issues such as high read/write amplification, significant compaction overhead, and data redundancy. Traditional optimization approaches aim to improve system performance by modifying tree structures, refining compaction strategies, and adopting key-value separation mechanisms. In the era of big data, the rapid growth of data volume leads to increasingly frequent write and compaction operations in LSM-tree systems, placing continuous pressure on CPU computing resources and gradually turning them into performance bottlenecks. Moreover, traditional solutions fail to fundamentally avoid the substantial I/O traffic between the host and storage devices, resulting in high overhead due to redundant data movement. Computational storage technology offers a promising solution to these challenges. By integrating computing resources at the storage layer, it enables task offloading to alleviate the CPU's workload and supports near-data processing to reduce the performance overhead caused by data migration. This survey focuses on optimization strategies for LSM-tree based on computational storage. First, the architecture of computational storage is reviewed. Then, in response to the major bottlenecks under the big data context, existing solutions are classified and compared from two perspectives: compaction optimization and data migration optimization. Finally, potential future research directions are suggested to provide insights in this field.

摘要: 日志结构合并树(Log-Structured Merge tree,LSM-tree)被广泛用于键值存储系统,凭借顺序写入机制实现高效的写入性能,但同时也带来了读写放大率高、合并任务开销大及数据冗余等问题。传统优化方案通过调整树结构、优化合并策略以及采用键值分离机制等方式提升系统性能。然而,在大数据时代,数据规模急剧飙升,LSM-tree 需要处理更频繁的写入与合并任务,导致 CPU 计算资源持续紧张,逐渐成为系统性能提升的瓶颈。此外,传统优化方案未能避免主机与存储设备间大量的I/O操作,仍面临高昂的冗余数据迁移开销。计算存储技术为应对上述挑战带来了新思路。该技术在存储层部署额外算力资源,通过任务卸载减轻CPU负担,或进一步通过近数据处理降低数据迁移带来的性能损耗。本文聚焦于基于计算存储技术的LSM-tree优化研究。首先,对计算存储技术架构进行梳理。然后,针对大数据背景下系统面临的主要瓶颈,从合并任务优化与数据迁移优化两个方面对现有方案进行分类介绍和对比讨论。最后,结合当前研究的局限性与发展趋势,对未来的研究方向进行了展望。