作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 194-203. doi: 10.19678/j.issn.1000-3428.0069497

• 体系结构与软件技术 • 上一篇    下一篇

FEC-ABP: 一种自适应降低HPC互连网络数据传输时延的方法

曹继军1, 王耀慧1, 罗章1, 孔德永2,*()   

  1. 1. 国防科技大学计算机学院, 湖南 长沙 410073
    2. 湖北经济学院信息工程学院, 湖北 武汉 430205
  • 收稿日期:2024-03-06 修回日期:2024-07-17 出版日期:2025-11-15 发布日期:2024-08-20
  • 通讯作者: 孔德永
  • 基金资助:
    国家重点研发计划(2021YFB2206601)

FEC-ABP: An Adaptive Method for Reducing Data Transmission Latency in HPC Interconnect Networks

CAO Jijun1, WANG Yaohui1, LUO Zhang1, KONG Deyong2,*()   

  1. 1. College of Computer, National University of Defense Technology, Changsha 410073, Hunan, China
    2. School of Information Engineering, Hubei University of Economics, Wuhan 430205, Hubei, China
  • Received:2024-03-06 Revised:2024-07-17 Online:2025-11-15 Published:2024-08-20
  • Contact: KONG Deyong

摘要:

为了降低高性能计算(HPC)互连网络的数据传输延迟, 通常在物理编码子层(PCS)采用可配置旁路纠错方法, 但是存在难以适应物理介质误码变化性、难以适应链路层报文与编码子层前向纠错(FEC)块的粒度差异性等问题。为此, 提出一种自适应旁路FEC解码过程的方法FEC-ABP。FEC-ABP优化接收端数据处理过程, 使得经过通道锁定和重定序的数据复制为两路, 子流程A数据经过完整的FEC解码及后续处理(即删除对齐标记和校验码、解扰、257/264解码、66/64解码和速率匹配)进入链路层, 而子流程B数据完全旁路FEC解码, 只经过后续处理进入链路层。链路层并行处理两路数据, 根据各路报文携带的循环冗余校验码(CRC)和序列号判定接收哪路报文, 并采用Go-back-N机制负责不可纠错报文的重传。基于FEC-ABP方法, 无差错报文传输可以获得旁路FEC解码带来的低延迟性能, 而可纠错报文传输可以获得FEC解码纠错带来的可靠性。实验结果表明, FEC-ABP方法以较低的资源消耗代价获取了较好的数据平均传输延迟优化效果, 这对于实现HPC互连网络的较低延迟数据传输具有重要作用。

关键词: 互连网络, 前向纠错编码, Go-back-N重传机制, 旁路解码, 低延迟

Abstract:

To minimize the data transmission latency in the link layer and Physical Coding Sublayer (PCS) of interconnect networks in High-Performance Computing (HPC), a configurable bypass error correction method is typically used at the physical coding sublayer. However, adapting to the variability of the physical medium Bit Error Rate (BER) and to the granularity variability of the link layer packets and physical coding sublayer Forward Error Correction (FEC) blocks is challenging. Therefore, a new method called FEC-ABP for the adaptive bypass forward error correction decoding process is proposed. FEC-ABP optimizes data processing on the receiving side to replicate the locked and reordered data in two paths. Subprocess A enters the link layer via complete FEC decoding and other data processing mechanisms (i.e., deleting alignment markers and checksums, descrambling, 257/264 decoding, 66/64 decoding, and rate matching). By contrast, subprocess B completely bypasses the FEC decoding and only enters the link layer via other data processing mechanisms. The link layer determines the path from which the packet is received based on its Cyclic Redundancy Check (CRC) code and sequence number. It uses the Go-back-N mechanism to ensure reliable retransmission of uncorrectable packets. Based on the FEC-ABP method, a lower latency can be achieved for error-free packet transmission by bypassing FEC decoding and a reliable correctable packet transmission can be achieved owing to FEC decoding error correction. The experimental results indicate that the FEC-ABP method optimizes the average transmission latency with low resource consumption, which helps in achieving the lower-latency data transmission for HPC interconnect networks.

Key words: interconnect network, Forward Error Correction (FEC) coding, Go-back-N retransmission mechanism, bypass decoding, lower latency