作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

面向LUNA芯片的Bi-LSTM算子的优化方法

  • 发布日期:2025-08-12

Efficiency Optimization Method for Bi-LSTM Targeting LUNA Chips

  • Published:2025-08-12

摘要: 针对深度学习模型的效率优化技术是当下人工智能应用领域的研究热点之一。在部署深度学习模型时,可通过降低模型中的算子调度开销与提高算子的执行效率的方式来获得模型效率的提升。该文针对在各时序网络中被频繁使用的双向长短期记忆(Bi-directional Long-Short Term Memory,BiLSTM)结构,基于其结构中包含的正向与逆向的长短期记忆(Long-Short Term Memory,LSTM)细胞的输入可复用特性,同时应用算子融合与张量计算合并技术,提出一种应用于LUNA芯片的Bi-LSTM算子效率优化方法,该方法通过消除冗余操作、进行数据复用与合并张量计算的方法降低时间开销,提升Bi-LSTM算子的执行效率。该算法亦可推广到Bi-RNN及Bi-GRU等时序网络算子中。同时基于边缘端国产LUNA芯片,建立对优化算法验证的实验平台。实验结果表明,应用本文提出的Bi-LSTM算法效率优化方法,能够实现最大37.6%的优化效果。

Abstract: Efficiency optimization for deep learning models remains a key research focus in artificial intelligence applications. When deploying deep learning models, efficiency improvements can be achieved by reducing operator scheduling overhead and improving operator execution efficiency. This paper targets the Bi-directional Long-Short Term Memory (Bi-LSTM) structure widely used in temporal networks. Leveraging the input reuse characteristics between forward and reverse Long-Short Term Memory (LSTM) cells in its architecture, the study proposes an efficiency optimization method for Bi-LSTM operators on LUNA chips through operator fusion and tensor computation consolidation techniques. This method enhances the execution efficiency of the Bi-LSTM operator by eliminating redundant operations, reusing data, and merging tensor computations to reduce time overhead. The algorithm can also extend to other temporal network operators like Bi-RNN and Bi-GRU. An experimental platform is established on the edge heterogeneous computing chip LUNA to validate the optimization algorithm. Test results demonstrate that the proposed Bi-LSTM efficiency optimization approach achieves a maximum optimization improvement of 30%.