作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 283-290. doi: 10.19678/j.issn.1000-3428.0065018

• 开发研究与工程应用 • 上一篇    下一篇

一种高速可伸缩的双域Montgomery模乘器架构

陈億1, 杨萱2, 曾涵1, 李伟1   

  1. 1. 战略支援部队信息工程大学 密码工程学院, 郑州 450001
    2. 江南计算技术研究所, 江苏 无锡 214083
  • 收稿日期:2022-06-17 出版日期:2023-08-15 发布日期:2022-09-22
  • 作者简介:

    陈億(1998—),男,硕士研究生,主研方向为集成电路设计

    杨萱,工程师

    曾涵,硕士研究生

    李伟,教授

  • 基金资助:
    国防创新基金(2019_JCJQ_JJ_123)

A High-Speed Scalable Dual-Field Montgomery Modular Multiplier Architecture

Yi CHEN1, Xuan YANG2, Han ZENG1, Wei LI1   

  1. 1. School of Cryptographic Engineering, Strategic Support Force Information Engineering University, Zhengzhou 450001, China
    2. Jiangnan Computing Technology Institute, Wuxi 214083, Jiangsu, China
  • Received:2022-06-17 Online:2023-08-15 Published:2022-09-22

摘要:

为提高Montgomery模乘在硬件实现上的运算速度并保持较高的性能,提出一种适用于高速椭圆曲线密码处理器的高速可伸缩的双域Montgomery模乘算法及其硬件架构。通过迭代调用Karatsuba乘法,实现最大位宽为576 bit的Montgomery模乘,并利用Montgomery模乘相邻运算部分数据的无关性,通过提前计算部分数据,减少Montgomery模乘运算使用的时钟周期数。基于Karatsuba算法中多次使用大位宽加法运算带来资源消耗大和超长进位链的问题,设计基于双域4-2压缩变换的加法选择电路结构,将一个超大位宽的加法运算拆分成多个小位宽的加法,在一个时钟周期内同时得到所有加法运算的结果,并根据加法的进位输出进行最终输出结果的选择,有效缩短加法进位链的延时。实验结果表明,相比基于ASIC的Montgomery模乘实现方案,Montgomery模乘算法及硬件架构具有更高的灵活性,在65 nm的CMOS工艺下进行逻辑综合,最高时钟频率能够达到459 MHz,面积资源占用为480 254 µm2,完成0~145 bit、146~289 bit、290~435 bit和436~576 bit的Montgomery模乘分别仅需要8.72 ns、23.98 ns、58.86 ns和71.94 ns,且具有较低的面积时间积。

关键词: Montgomery模乘器, Karatsuba乘法, 可伸缩, 椭圆曲线密码, 双域4-2压缩变换

Abstract:

A high-speed scalable dual-field Montgomery Modular Multiplier(MMM) algorithm and its hardware architecture are proposed for an Elliptic Curve Cryptography(ECC) processor to improve its performance. The 576 bit scalable MMM is implemented in this study by iteratively calling the Karatsuba algorithm.The number of clock cycles is reduced by calculating some of the data in advance, utilizing the data irrelevance in the adjacent operations of the MMM. To avoid large resource consumption and a long carry chain in the implementation of the Karatsuba algorithm, a carry-select addition approach based on dual-field 4-2 compressors is proposed. It splits a large addition into several small additions, simultaneously obtains all the addition results, and selects the final result according to the addition carries, shortening the delay of the carry chain.The experimental results show that, compared with other MMM implementations in Application-Specific Integrated Circuits(ASIC), the proposed MMM hardware architecture has higher flexibility.Synthesized with a 65 nm Complementary Metal-Oxide-Semiconductor(CMOS) process, the maximum frequency is 459 MHz and the area is 480 254 µm2. Completing MMMs of 0-145 bit, 146-289 bit, 290-435 bit, and 436-576 bit takes 8.72 ns, 23.98 ns, 58.86 ns, and 71.94 ns, respectively, with a low area-time product.

Key words: Montgomery Modular Multiplier(MMM), Karatsuba multiplication, scalable, Elliptic Curve Cryptography(ECC), dual-field 4-2 compression transformation