Implementation and Optimization of Base-calling Algorithm Based on DCU

doi:10.19678/j.issn.1000-3428.0252076

Abstract

Abstract: Basecalling, the process of converting raw electrical signals into DNA sequences in nanopore sequencing, is a critical step that directly impacts the timeliness of genomic analysis. Addressing the limitations of existing basecalling tools in computational acceleration, hardware adaptation, and system-level optimization, this study proposes and implements three innovative optimization strategies that significantly enhance computational performance and enable deployment on domestic hardware platforms. The main contributions of this study are as follows: First, we developed OpenKoi, a high-performance acceleration library based on a heterogeneous computing architecture. By performing operator-level optimizations, we achieved algorithmic breakthroughs. For key operators such as LSTM and Conditional Random Fields (CRFs), we introduced a novel matrix concatenation strategy and a parallel execution scheme, reducing the number of GEMM operations required for each LSTM step from eight to one. We also implemented a block-level parallel beam search algorithm. Second, we proposed a heterogeneous pipeline architecture that overcomes traditional I/O bottlenecks by enabling three-stage parallelism: data loading, GPU computation, and result output. This architecture demonstrated linear scalability on the DCU platform. Third, we developed DCUCaller, the first basecalling system compatible with domestic DCU (Dawning Computing Unit) hardware. Its innovation lies in the co-optimization of hardware adaptation and quantization techniques. Leveraging the HIP programming framework for cross-platform compatibility, DCUCaller integrates the OpenKoi library and the heterogeneous pipeline framework to optimize basecalling throughput. Through innovations in algorithm design, system architecture, and hardware ecosystem integration, this study not only significantly improves the efficiency of basecalling but also provides critical technical support for the large-scale application of domestic computing platforms in bioinformatics. It holds strategic significance for promoting the independent development of genome sequencing technologies.

摘要： 碱基识别作为纳米孔测序技术中将原始电信号转换为DNA序列的核心环节，其计算效率直接影响基因组分析流程的时效性。针对现有碱基识别工具在计算加速、硬件适配和系统优化方面存在的不足，本研究提出并实现了三大创新性优化方案，显著提升了碱基识别任务的计算性能和国产化部署能力。本研究主要创新体现在：第一，构建了基于异构计算架构的OpenKoi加速库，通过算子级优化实现了算法层面的突破。针对LSTM和条件随机场等核心算子，创新性地设计了矩阵拼接策略与并行化执行方案，将LSTM单步计算所需的GEMM操作从8次降为1次，并开发了基于线程块粒度的集束搜索算法。第二，提出了异构流水线架构，攻克了传统流程中的I/O瓶颈问题，实现了数据读取、GPU计算与结果写入的三级流水并行，在DCU平台获得线性扩展效率。第三，研发了首个支持国产海光DCU的碱基识别系统DCUCaller，其创新性体现在硬件适配与量化技术的协同优化。通过HIP编程框架实现跨平台兼容，采用本研究的OpenKoi库与异构流水线框架来优化碱基识别算法的吞吐量。本研究通过算法优化、系统架构创新和硬件生态建设的三维突破，不仅显著提升了碱基识别任务的执行效率，更为国产计算平台在生物信息领域的大规模应用提供了关键技术支撑，对推动基因测序技术的自主化发展具有重要战略意义。

Bo Kaibin , Li Yewen , Zhang Zhonghai, Tan Guangming. Implementation and Optimization of Base-calling Algorithm Based on DCU[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252076.

薄凯彬, 李叶文, 张中海, 谭光明. 基于海光DCU的碱基识别算法的实现与优化[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252076.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252076

References

[1] 葛奇, 张鹏, 韩明哲, 等. 纳米孔测序信号处理及其在 DNA 数据存储的应用[J]. 中国生物工程杂志, 2021, 41(8): 75-89. Ge, Q., Zhang, P., Han, M. Z., et al. Nanopore sequencing signalprocessing and its application in DNA data storage. China Biotechnology, 2021, 41(8), 75–89.
[2] Das S, Vikalo H. Base calling for high-throughput short-read sequencing: dynamic programming solutions[J]. BMC bioinformatics, 2013, 14: 1-10.
[3] 孟浩. 基于深度学习的纳米孔 DNA 测序碱基电信号识别算法研究 [D]. 东南大学, 2022. Meng, H. Research on basecaller algorithms for nanopore DNA sequencing based on deep learning [D]. Southeast University, 2022.
[4] Wick R R, Judd L M, Holt K E. Performance of neural network basecalling tools for Oxford Nanopore sequencing[J]. Genome biology, 2019, 20: 1-10.
[5] 杜力安. 基于条件随机场的纳米孔测序信号识别[D]. 北京交通大学, 2022. Du, L. A. Nanopore sequencing signal recognition based on conditional random fields [D]. Beijing Jiaotong University, 2022.
[6] Pagès-Gallego M, de Ridder J. Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling[J]. Genome Biology, 2023, 24(1): 71.
[7] 黄能. 纳米孔测序碱基识别、组装抛光及SNP识别算法研究[D]. 中南大学, 2022. Huang, N. Research on basecalling, assembly polishing, and SNP identification algorithms for nanopore sequencing [D]. Central South University, 2022.
[8] Vereecke N, Bokma J, Haesebrouck F, et al. High quality genome assemblies of Mycoplasma bovis using a taxon-specific Bonito basecaller for MinION and Flongle long-read nanopore sequencing[J]. BMC bioinformatics, 2020, 21: 1-16.
[9] Lv X, Chen Z, Lu Y, et al. An end-to-end Oxford Nanopore basecaller using convolution-augmented transformer[C]//2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2020: 337-342.
[10] Huang N, Nie F, Ni P, et al. An attention-based neural network basecaller for Oxford Nanopore sequencing data[C]//2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2019: 390-394.
[11] Zeng J, Cai H, Peng H, et al. Causalcall: Nanopore basecalling using a temporal convolutional network[J]. Frontiers in Genetics, 2020, 10: 1332.
[12] Zhang Y, Akdemir A, Tremmel G, et al. Nanopore basecalling from a perspective of instance segmentation[J]. BMC bioinformatics, 2020, 21: 1-9.
[13] Miculinić N, Ratković M, Šikić M. MinCall-MinION end2end convolutional deep learning basecaller[J]. arXiv preprint arXiv:1904.10337, 2019.
[14] Wu Z, Liu Z, Lin J, et al. Lite transformer with long-short range attention[J]. arXiv preprint arXiv:2004.11886, 2020.
[15] Boža V, Brejová B, Vinař T. DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads[J]. PloS one, 2017, 12(6): e0178751.
[16] Strgar L, Harwath D. Phoneme segmentation using self-supervised speech models[J]. arXiv preprint arXiv:2211.01461, 2022.
[17] Hannun A. Sequence modeling with ctc[J]. Distill, 2017, 2(11): e8.
[18] Teng H, Cao M D, Hall M B, et al. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning[J]. GigaScience, 2018, 7(5): giy037.
[19] Xu Z, Mai Y, Liu D, et al. Fast-bonito: A faster deep learning based basecaller for nanopore sequencing[J]. Artificial Intelligence in the Life Sciences, 2021, 1: 100011.
[20] Lou Q, Janga S C, Jiang L. Helix: Algorithm/architecture co-design for accelerating nanopore genome base-calling[C]//Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 2020: 293-304.
[21] Dunn T, Sadasivan H, Wadden J, et al. Squigglefilter: An accelerator for portable virus detection[C]//MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 2021: 535-549.
[22] Silvestre-Ryan J, Holmes I. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing[J]. Genome biology, 2021, 22: 1-6.
[23] Mao H, Alser M, Sadrosadati M, et al. Genpip: In-memory acceleration of genome analysis via tight integration of basecalling and read mapping[C]//2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2022: 710-726.
[24] Shahroodi T, Singh G, Zahedi M, et al. Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors[C]//Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 2023: 1437-1452.
[25] LI H. Minimap2: pairwise alignment for nucleotide sequences[J]. Bioinformatics, 2018, 34(18): 3094-3100.
[26] Chen T, Du Z, Sun N, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning[C]. ACM ASPLOS, 2014: 269-284.
[27] Huawei Technologies. CANN: Compute Architecture for Neural Networks White Paper[R]. Shenzhen: Huawei, 2020: 15-18.
[28] AMD. ROCm HIP Porting Guide[EB/OL]. https://rocm.docs.amd.com/, 2023: Section 3.2 API Compatibility.

Please choose a citation manager

Content to export