Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Implementation and Optimization of Base-calling Algorithm Based on DCU

  

  • Online:2025-07-14 Published:2025-07-14

基于海光DCU的碱基识别算法的实现与优化

Abstract: Basecalling, the process of converting raw electrical signals into DNA sequences in nanopore sequencing, is a critical step that directly impacts the timeliness of genomic analysis. Addressing the limitations of existing basecalling tools in computational acceleration, hardware adaptation, and system-level optimization, this study proposes and implements three innovative optimization strategies that significantly enhance computational performance and enable deployment on domestic hardware platforms. The main contributions of this study are as follows: First, we developed OpenKoi, a high-performance acceleration library based on a heterogeneous computing architecture. By performing operator-level optimizations, we achieved algorithmic breakthroughs. For key operators such as LSTM and Conditional Random Fields (CRFs), we introduced a novel matrix concatenation strategy and a parallel execution scheme, reducing the number of GEMM operations required for each LSTM step from eight to one. We also implemented a block-level parallel beam search algorithm. Second, we proposed a heterogeneous pipeline architecture that overcomes traditional I/O bottlenecks by enabling three-stage parallelism: data loading, GPU computation, and result output. This architecture demonstrated linear scalability on the DCU platform. Third, we developed DCUCaller, the first basecalling system compatible with domestic DCU (Dawning Computing Unit) hardware. Its innovation lies in the co-optimization of hardware adaptation and quantization techniques. Leveraging the HIP programming framework for cross-platform compatibility, DCUCaller integrates the OpenKoi library and the heterogeneous pipeline framework to optimize basecalling throughput. Through innovations in algorithm design, system architecture, and hardware ecosystem integration, this study not only significantly improves the efficiency of basecalling but also provides critical technical support for the large-scale application of domestic computing platforms in bioinformatics. It holds strategic significance for promoting the independent development of genome sequencing technologies.

摘要: 碱基识别作为纳米孔测序技术中将原始电信号转换为DNA序列的核心环节,其计算效率直接影响基因组分析流程的时效性。针对现有碱基识别工具在计算加速、硬件适配和系统优化方面存在的不足,本研究提出并实现了三大创新性优化方案,显著提升了碱基识别任务的计算性能和国产化部署能力。本研究主要创新体现在:第一,构建了基于异构计算架构的OpenKoi加速库,通过算子级优化实现了算法层面的突破。针对LSTM和条件随机场等核心算子,创新性地设计了矩阵拼接策略与并行化执行方案,将LSTM单步计算所需的GEMM操作从8次降为1次,并开发了基于线程块粒度的集束搜索算法。第二,提出了异构流水线架构,攻克了传统流程中的I/O瓶颈问题,实现了数据读取、GPU计算与结果写入的三级流水并行,在DCU平台获得线性扩展效率。第三,研发了首个支持国产海光DCU的碱基识别系统DCUCaller,其创新性体现在硬件适配与量化技术的协同优化。通过HIP编程框架实现跨平台兼容,采用本研究的OpenKoi库与异构流水线框架来优化碱基识别算法的吞吐量。本研究通过算法优化、系统架构创新和硬件生态建设的三维突破,不仅显著提升了碱基识别任务的执行效率,更为国产计算平台在生物信息领域的大规模应用提供了关键技术支撑,对推动基因测序技术的自主化发展具有重要战略意义。