作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 15-24. doi: 10.19678/j.issn.1000-3428.0067167

• 热点与综述 • 上一篇    下一篇

基于AVX512的格密码高速并行实现

雷斗威, 何德彪*(), 罗敏, 彭聪   

  1. 武汉大学国家网络安全学院, 湖北 武汉 430072
  • 收稿日期:2023-03-13 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 何德彪
  • 基金资助:
    国家重点研发计划(2022YFB4400700)

High-speed Parallel Implementation of Lattice-based Cryptography Based on AVX512

Douwei LEI, Debiao HE*(), Min LUO, Cong PENG   

  1. School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, Hubei, China
  • Received:2023-03-13 Online:2024-02-15 Published:2024-02-22
  • Contact: Debiao HE

摘要:

量子计算的迅速发展可能对当前广泛使用的公钥密码算法造成严重威胁。格密码因优秀的抗量子安全性和高效的计算效率在后量子密码中占据重要地位。美国国家标准技术研究院于2022年5月公布4个后量子密码标准,其中3个是格密码算法,Kyber算法便是其中之一。随着后量子密码标准的确定,Kyber算法高效实现的需求日益增加。基于512位高级向量扩展(AVX512),对Kyber算法进行优化与高速并行实现。使用惰性模约减、优化的蒙哥马利模约减及优化的快速数论变化等技术,充分利用计算机的存储空间,减少大量不必要的模约减操作,提高多项式计算的效率与并行性。采用冗余比特技术,增强多项式抽样过程中比特的并行处理能力。通过AVX512的512 bit位宽和8路并行实现哈希运算,并对其产生的伪随机比特串进行合理调度,充分发挥并行性能。基于AVX512指令集高速并行实现Kyber上的多项式计算和抽样,并进一步实现整个Kyber公钥加密方案。性能测试结果表明,与C语言实现相比,基于AVX512实现的密钥生成和加密算法获得了10~16倍的加速,解密算法获得了约56倍的加速。

关键词: 后量子密码, 格密码, 公钥加密, 512位高级向量扩展指令集, 并行计算

Abstract:

The rapid development of quantum computing seriously threatens the security of widely used public-key cryptography. Lattice-based cryptography occupies an essential position in Post-Quantum Cryptography(PQC) owing to its excellent anti-quantum security and efficient computational efficiency. In May 2022, the National Institute of Standards and Technology(NIST) published four PQC standards, three of which are lattice-based cryptography algorithms, along with Kyber. With the identification of post-quantum standards, the importance and need for their efficient implementation is increasing. This study presents an optimized and high-speed parallel implementation of the Kyber algorithm based on the Advanced Vector eXtensions 512(AVX512). It utilizes techniques such as lazy reduction, optimized Montgomery modular reduction, and optimized Number-Theoretic Transformation(NTT) to reduce unnecessary modular reduction operations and improve the efficiency and parallelism of polynomial computations by fully utilizing computer storage space. It also employs redundant bit technology to improve the parallel processing capability of bits during polynomial sampling. The 512 bit width of AVX512 is utilized to perform 8-way parallel Hash operations, and the resulting pseudo-random bit strings are properly scheduled to fully leverage parallel performance. Finally, this study implements polynomial computations and sampling on Kyber in high-speed parallel using the AVX512 instruction set and further implements the entire Kyber public-key encryption scheme. Performance test results indicate that the key generation and encryption algorithms in this study achieve 10 to 16 times acceleration compared to the C language implementation provided in the standard documentation, while the decryption algorithm achieves approximately 56 times acceleration.

Key words: Post-Quantum Cryptography(PQC), lattice-based cryptography, public-key encryption, Advanced Vector eXtensions 512(AVX512) instruction set, parallel computation