作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 282-292. doi: 10.19678/j.issn.1000-3428.0069229

• 网络空间安全 • 上一篇    下一篇

面积高效的格密码多项式乘法硬件实现

谢家兴1, 蒲金伟1, 方伟钿1, 郑欣2,*(), 熊晓明2   

  1. 1. 广东工业大学自动化学院, 广东 广州 510006
    2. 广东工业大学集成电路学院, 广东 广州 510006
  • 收稿日期:2024-01-15 修回日期:2024-07-24 出版日期:2026-01-15 发布日期:2024-10-15
  • 通讯作者: 郑欣
  • 作者简介:

    谢家兴, 男, 硕士研究生, 主研方向为公钥密码算法硬件加速

    蒲金伟, 硕士研究生

    方伟钿, 硕士研究生

    郑欣(通信作者), 副教授、博士

    熊晓明, 教授、博士

  • 基金资助:
    广东省基础与应用基础研究基金(2021A1515110777); 广东省重点领域研发计划(2022B0701180001)

Area-Efficient Polynomial Multiplication Hardware Implementation for Lattice-based Cryptography

XIE Jiaxing1, PU Jinwei1, FANG Weitian1, ZHENG Xin2,*(), XIONG Xiaoming2   

  1. 1. School of Automation, Guangdong University of Technology, Guangzhou 510006, Guangdong, China
    2. School of Integrated Circuits, Guangdong University of Technology, Guangzhou 510006, Guangdong, China
  • Received:2024-01-15 Revised:2024-07-24 Online:2026-01-15 Published:2024-10-15
  • Contact: ZHENG Xin

摘要:

基于格的后量子密码算法在公钥密码领域具有广泛的应用前景, 多项式乘法的计算复杂性是其硬件实现的主要性能瓶颈。针对多项式乘法实现存在的面积效率低和内存映射冲突等问题, 提出一种基于部分数论变换(PNTT)和系数交叉运算(CCO)的多项式乘法结构。首先, 将数论变换(NTT)最后一轮、系数相乘和逆数论变换(INTT)第一轮融合成CCO, 减少2轮蝶形运算和50%的旋转因子存储空间, 降低内存访问开销; 其次, 采用轻量级硬件分别实现模加、模减、除2运算以及优化后的基于Barrett的模乘运算, 有效减少逻辑资源开销, 同时采用流水线、分时复用技术设计可重构运算单元(PE)阵列, 使得各运算单元可以在不同变换下进行高效重组连接; 此外, 在内存映射方案上引入系数分组存储和特殊内存映射方法, 利用地址映射规律对数据和旋转因子实现高效调度, 避免内存映射冲突问题, 以低成本实现内存访问; 最后, 采用先入先出(FIFO)结构实现数据重组, 提升数据访问效率。实验结果显示, 所提出的PM结构在Slices和数字信号处理器(DSP)的面积延时积(ATP)指标上相比于现有相关工作分别降低21.7%和61.1%以上, 具有更高的面积效率。

关键词: 格密码, 多项式乘法, 数论变换, 模约简, 无冲突内存映射

Abstract:

Lattice-based post-quantum cryptography algorithms demonstrate significant potential in public-key cryptography. A key performance bottleneck in hardware implementation is the computational complexity of polynomial multiplication. To address the problems of low area efficiency and memory mapping conflicts encountered in polynomial multiplication, this study proposes a polynomial multiplication structure based on Partial Number Theoretic Transform (PNTT) and a Coefficient Crossover Operation (CCO). First, the last round of the Number Theoretic Transform (NTT), coefficient multiplication, and the first round of the Inverse Number Theoretic Transform (INTT) are merged into a CCO, reducing two rounds of butterfly operations and 50% of the twiddle factor storage space; consequently, memory access overhead is lowered. Second, lightweight hardware is employed to implement modular addition, modular subtraction, division by two, and enhanced Barrett-based modular multiplication, effectively reducing the logical resource overhead. Simultaneously, the study designs a reconfigurable Processing Element (PE) array using pipeline and time-sharing multiplexing techniques, allowing each operation unit to be efficiently reconnected under different transformations. In addition, the study introduces coefficient grouping storage and special memory mapping methods in the memory mapping scheme. The efficient scheduling of data and twiddle factors is achieved by leveraging address-mapping rules, avoiding memory mapping conflicts, and achieving low-cost memory access. Finally, a First Input First Output (FIFO) structure is employed for data reorganization, which enhances data access efficiency. Experimental results show that the proposed polynomial multiplication structure reduces the Area-Time Product (ATP) of Slices and Digital Signal Processor (DSP) by over 21.7% and 61.1%, respectively, compared to existing works and has a higher area efficiency.

Key words: lattice-based cryptography, polynomial multiplication, Number Theoretic Transform (NTT), modular reduction, conflict-free memory mapping