作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (12): 147-155,162. doi: 10.19678/j.issn.1000-3428.0059799

• 网络空间安全 • 上一篇    下一篇

基于OpenCL的3DES算法FPGA加速器

吴健凤1, 郑博文2, 聂一2, 柴志雷1,3   

  1. 1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122;
    2. 江南大学 物联网工程学院, 江苏 无锡 214122;
    3. 数学工程与先进计算国家重点实验室, 江苏 无锡 214215
  • 收稿日期:2020-10-20 修回日期:2020-12-07 发布日期:2021-12-08
  • 作者简介:吴健凤(1996-),女,硕士研究生,主研方向为信息安全、计算机体系结构;郑博文、聂一,硕士研究生;柴志雷,教授、博士。
  • 基金资助:
    国家自然科学基金(61972180);数学工程与先进计算国家重点实验室开放基金(2018A04)。

FPGA Accelerator for 3DES Algorithm Based on OpenCL

WU Jianfeng1, ZHENG Bowen2, NIE Yi2, CHAI Zhilei1,3   

  1. 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China;
    2. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China;
    3. State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu 214215, China
  • Received:2020-10-20 Revised:2020-12-07 Published:2021-12-08

摘要: 在数字货币、区块链、云端数据加密等领域,传统以软件方式运行的数据加解密存在计算速度慢、占用主机资源、功耗高等问题,而以Verilog/VHDL等方式实现的现场可编程门阵列(FPGA)加解密系统又存在开发周期长、维护升级困难等问题。针对3DES算法,提出一种基于OpenCL的FPGA加速器设计方案。设计具有48轮迭代的流水并行结构,在数据传输模块中采用数据存储调整、数据位宽改进策略提高内核实际带宽利用率,在算法加密模块中采用指令流优化策略形成流水线并行架构,同时采用内核矢量化、计算单元复制策略进一步提高内核性能。实验结果表明,该加速器在Intel Stratix 10 GX2800上可获得111.801 Gb/s的吞吐率,与Intel Core i7-9700 CPU相比性能提升372倍,能效提升644倍,与NvidiaGeForce GTX 1080Ti GPU相比性能提升20%,能效提升9倍。

关键词: OpenCL框架, 现场可编程门阵列, 加解密算法, 3DES算法, 流水并行结构

Abstract: Nowadays, encryption and decryption algorithms are widely used in digital currency, blockchain, cloud data encryption and other fields.Traditional software-based data encryption is limited in the calculation speed while occupying many host resources and having high power consumption.Also, Field Programmable Gate Array(FPGA)-based encryption and decryption implemented in Verilog/VHDL suffer from the long development cycles and difficult maintenance and upgrades.To address the above problems, a design scheme of a FPGA accelerator for 3DES algorithm based on OpenCL is proposed.In the scheme, a pipeline parallel structure with 48 iterations is designed by adjusting data storage, improving data bit width, optimizing instruction stream, vectorising Kernels and replicating compute units.For the data transmission module, the actual bandwidth utilization of the Kernel is improved by adjusting data storage and increasing data bit width.For the algorithm encryption module, the instruction stream is optimized to form a pipeline parallel architecture.In addition, the performance of the Kernel is further improved by kernel vectorization and compute unit replication strategies.The experimental results show that the accelerator provides a throughput rate of 111.801 Gb/s on Intel Stratix 10 GX2800.Compared with the Intel Core i7-9700 CPU, the proposed accelerator improves the performance by 372 times and the energy efficiency by 644 times.Compared with the Nvidia GeForce GTX 1080Ti GPU, the proposed accelerator improves the performance by 20% and the energy efficiency by 9 times.

Key words: OpenCL framework, Field Programmable Gate Array(FPGA), encryption and decryption algorithm, 3DES algorithm, pipeline parallel structure

中图分类号: