Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (3): 264-275. doi: 10.19678/j.issn.1000-3428.0069251

• Computer Architecture and Advanced Computing • Previous Articles     Next Articles

SmartNIC-Offloaded Worker Node for Storage━Compute Disaggregated Recommendation System

SHI Ruixin, YAN Ming*(), WU Jie   

  1. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2024-01-18 Revised:2024-09-13 Online:2026-03-15 Published:2024-12-06
  • Contact: YAN Ming

基于智能网卡优化的存算分离式推荐系统计算节点

石睿欣, 严明*(), 吴杰   

  1. 复旦大学计算机科学技术学院, 上海 200433
  • 通讯作者: 严明
  • 作者简介:

    石睿欣(CCF学生会员),女,硕士研究生,主研方向为智能网卡、推荐系统、计算机网络

    严明(CCF会员、通信作者),工程师、硕士

    吴杰(CCF会员),研究员、博士、博士生导师

  • 基金资助:
    国家重点研发计划"社会治理与智慧社会科技支撑"重点专项(2021YFC3300600)

Abstract:

Deep learning-based recommendation systems are commonly used to provide personalized recommendations. In a common storage—compute disaggregated inference architecture, the inference speed of the recommendation system is limited by the internode network transmission bottleneck caused by embedding queries. The emerging SmartNIC technology enables complex traffic control without contending for host Central Processing Unit (CPU) resources, offering new possibilities for optimizing the embedding layer in disaggregated recommendation systems. This study proposes SmartNIC-offloaded Worker Node (SmartWN), a disaggregated recommendation system worker node optimized via SmartNIC. By leveraging the independent computing and communication capabilities of SmartNICs, SmartWN implements embedding query reordering and preparation, along with traffic-aware dynamic cache management for multiple embedding tables without impacting host resources. This significantly improves communication efficiency and cache utilization during recommendation inference, reduces embedding query latency, and enhances overall system performance. This study implements SmartWN on an NVIDIA BlueField-2 SmartNIC and demonstrates its performance improvements. Compared to existing technologies, using SmartWN as a compute node in a disaggregated recommendation system significantly enhances the embedding layer query throughput by 2.13x and reduces query latency by approximately 50.6%.

Key words: SmartNIC, recommendation system, storage—compute disaggregation, cache management, embedding look up, performance optimization

摘要:

当今工业界通常使用基于深度学习的推荐系统为用户进行定制化推荐, 在常见的存算分离式推理架构中推荐系统的推理速度受限于嵌入层查询部分导致的节点间网络传输瓶颈。新兴的智能网卡技术可以在避免对主机中央处理器(CPU)争用的基础上, 实现复杂流量控制, 为存算分离式推荐系统的嵌入层优化提供新的可能。设计并实现一种基于智能网卡优化的存算分离式推荐系统计算节点——SmartWN(SmartNIC-offloaded Worker Node)。SmartWN利用智能网卡的独立计算与通信能力, 在保证计算节点主机资源不受影响的前提下, 实现了嵌入层查询顺序调度与预准备以及基于流量的多表动态缓存管理, 使推荐系统推理时大幅提高了嵌入层查询的通信效率与缓存利用率, 降低了嵌入层查询时延, 提升了存算分离式推荐系统的推理性能。在智能网卡NVIDIA BlueField-2上实现了SmartWN原型并验证了性能提升, 与现有技术相比, 使用SmartWN作为存算分离式推荐系统计算节点最高提升了2.13倍的推理时嵌入层查询吞吐量, 并降低了约50.6%的嵌入层查询尾部时延。

关键词: 智能网卡, 推荐系统, 存算分离, 缓存管理, 嵌入层查询, 性能优化