可重构密码处理器研究与实现

doi:10.19678/j.issn.1000-3428.0253065

摘要/Abstract

摘要： 基于多算法适配需求，设计了一种可重构专用指令集处理器，用于高效支持分组密码与哈希函数。该架构采用超长指令字（VLIW）结构，结合分簇对称执行单元与跨簇寄存器访问机制，实现了逻辑运算、移位、查表等操作的并行化处理。在指令设计上，引入逻辑与查表融合指令、多模式移位指令及向量化操作，减少流水线停顿并提升指令密度。流水线方面，采用三级取指-译码-执行结构，并通过旁路机制解决数据相关问题，缩短关键路径。在算法映射与优化中，分组密码算法如SM4与AES利用Tbox查表与四簇并行调度，将每轮运算压缩至4与7个周期；哈希类算法如SHA-256与SM3通过多模式移位与布尔逻辑指令融合实现，每轮保持在8个周期；SHA-3则基于三阶段映射策略，将五个运算步骤重组为三步流水化执行，显著缓解依赖带来的停顿。硬件实现方面，在Xilinx Kintex-7 FPGA（XC7K325TFFG676-2）平台上完成综合，消耗11105个查找表（LUT）、1564个触发器（FF）、25个片上存储（BRAM），主频为125MHz。在该条件下，处理器实现了SM4 125 Mbps、AES 228.6 Mbps、SHA-256 125 Mbps、SM3 125 Mbps、SHA-3 75.6Mbps的吞吐率。实验结果表明，该架构在低资源开销下实现了多算法的统一加速，性能优于通用处理器扩展方案，具有良好的灵活性与可扩展性。

Abstract: 】To address the demand for multi-algorithm adaptability, a reconfigurable application-specific instruction set processor is designed to efficiently support block ciphers and hash functions. The architecture adopts a Very Long Instruction Word (VLIW) structure, combined with symmetric clustered execution units and a cross-cluster register access mechanism, enabling parallel processing of logic operations, shifts, and table lookups. In instruction set design, fused logic–lookup instructions, multi-mode shift instructions, and vector operations are introduced to reduce pipeline stalls and enhance instruction density. The pipeline is organized into three stages—fetch, decode, and execute—while a bypass mechanism is employed to resolve data hazards and shorten the critical path. In algorithm mapping and optimization, block ciphers such as SM4 and AES leverage T-box lookups and four-cluster parallel scheduling, reducing each round to 4 and 7 cycles, respectively; hash functions such as SHA-256 and SM3 utilize multi-mode shift and fused Boolean logic instructions, achieving 8 cycles per round; SHA-3 is mapped through a three-phase strategy that reorganizes its five steps into three pipelined stages, effectively mitigating dependency-induced stalls. For hardware implementation, synthesis is carried out on the Xilinx Kintex-7 FPGA (XC7K325TFFG676-2), consuming 11,105 look-up tables (LUTs), 1,564 flip-flops (FFs), and 25 block RAMs (BRAMs), operating at a frequency of 125 MHz. Under these conditions, the processor achieves throughputs of 125 Mbps for SM4, 228.6 Mbps for AES, 125 Mbps for SHA-256, 125 Mbps for SM3, and 75.6 Mbps for SHA-3. The experimental results demonstrate that this architecture achieves unified acceleration of multiple algorithms with low resource overhead, outperforming general-purpose processor extensions, while offering high flexibility and scalability.

逄瑞相, 万立, 张智, 吴露露, 周友龙. 可重构密码处理器研究与实现[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253065.

PANG Ruixiang, WAN Li, ZHANG Zhi, WU Lulu, ZHOU Youlong. Research and Implementation of Reconfigurable Cryptographic Processors[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253065.

参考文献

[1] Kumar C, Prajapati S S, Verma R K. A Survey of Various Lightweight Cryptography Block ciphers for IoT devices[C]//2022 IEEE International Conference on Current Development in Engineering and Technology (CCET). IEEE, 2022: 1-6.
[2] Sharma A K, Mittal S K. Cryptography & network security hash function applications, attacks and advances: A review[C]//2019 Third International Conference on Inventive Systems and Control (ICISC). IEEE, 2019: 177-188.
[3] Cheng W, Zheng F, Pan W, et al. High-performance symmetric cryptography server with GPU acceleration[C]//International Conference on Information and Communications Security. Cham: Springer International Publishing, 2017: 529-540.
[4] 谢明东,郝萌,杨洪伟,等.零知识证明硬件加速研究综述[J].信息安全研究,2024,10(07):594-601. Xie Mingdong, Hao Meng, Yang Hongwei, et al.A Review of Hardware Accelerated Research on Zero-knowledge Proofs[J].Journal of Information Security Reasearch,2024,10(07):594-601.
[5] Guan Z, Li Y, Shang T, et al. Implementation of SM4 on FPGA: Trade-off analysis between area and speed[C]//2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE, 2018: 192-197.
[6] Imran M, Shafi I, Jafri A R, et al. Hardware design and implementation of ECC based crypto processor for low-area-applications on FPGA[C]//2017 International Conference on Open Source Systems & Technologies (ICOSST). IEEE, 2017: 54-59.
[7] 崔广财,拾以娟,孟涛.异构紧耦合可重构密码芯片关键技术研究[J].计算机技术与发展,2020,30(07):76-80. Cui Guangcai,Shi Yijuan,Meng Tao. Research on Key Techniques of Heterogeneous Tightly Coupled Reconfigurable Cipher Chip[J].Computer Technology and Development,2020,30(07):76-80.
[8] Engroff A, Romanssini M, Compassi-Severo L, et al. ASIPAMPIUM: An efficient ASIP generator for low power applications[J]. Electronics, 2023, 12(2): 401.
[9] Lucas F ,Steffen M ,Patrik P , et al.Design of an Application-specific VLIW Vector Processor for ORB Feature Extraction[J].Journal of Signal Processing Systems,2023,95(7):863-875.
[10] Mohamed N ,Mounir B ,Abdessamad K , et al.Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures[J].Computers and Electrical Engineering,2022,102
[11] Li W, Zeng X, Nan L, et al. A high-flexibility and energy-efficient application-specific cryptography VLIW processor for symmetric cipher algorithms[C]//2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). IEEE, 2016: 1281-1284.
[12] Wei L, Xiaoyang Z, Longmei N, et al. A reconfigurable block cryptographic processor based on VLIW architecture[J]. China Communications, 2016, 13(1): 91-99.
[13] Wu L, Weaver C, Austin T. CryptoManiac: A fast flexible architecture for secure communication[J]. ACM SIGARCH Computer Architecture News, 2001, 29(2): 110-119.
[14] Priyanka J ,Bodhisatwa M .SPSA: Semi-Permanent Stuck-At fault analysis of AES Rijndael SBox[J].Journal of Cryptographic Engineering,2022,13(2):201-222.
[15] Akdemir K, Dixon M, Feghali W, et al. Breakthrough AES performance with intel AES new instructions[J]. White paper, June, 2010, 12: 217.
[16] Brown J, Woodward S, Bass B, et al. IBM power edge of network processor: A wire-speed system on a chip[J]. IEEE Micro, 2011, 31(2): 76-85.
[17] Bertoni G M, Breveglieri L, Roberto F, et al. Speeding up AES by extending a 32 bit processor instruction set[C]//IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06). IEEE, 2006: 275-282.
[18] El Hadj Youssef W, Abdelli A, Dridi F, et al. An efficient lightweight cryptographic instructions set extension for IoT device security[J]. Security and Communication Networks, 2022, 2022(1): 9709601.
[19] Wenheng M ,Qiao C ,Yudi G , et al.An Ultra-Low-Power Embedded Processor with Variable Micro-Architecture[J].Micromachines,2021,12(3):292-292.
[20] Bangerter E, Krenn S, Seifriz M, et al. cPLC—A cryptographic programming language and compiler[C]//2011 Information Security for South Africa. IEEE, 2011: 1-8.
[21] Agosta G, Pelosi G. A domain specific language for cryptography[C]//PROCEEDINGS-INTERNATIONAL FORUM ON DESIGN LANGUAGES. FRA, 2007: 159-164.
[22] Jürjens J. A domain-specific language for cryptographic protocols based on streams[J]. The Journal of Logic and Algebraic Programming, 2009, 78(2): 54-73.
[23] Bui D H, Puschini D, Bacles-Min S, et al. AES datapath optimization strategies for low-power low-energy multisecurity-level internet-of-things applications[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017, 25(12): 3281-3290.
[24] Ramakrishna D, Shaik M A. A comprehensive analysis of cryptographic algorithms: Evaluating security, efficiency, and future challenges[J]. IEEE Access, 2024.
[25] 王江涛,樊荣,黄哲.SM9中高次幂运算的快速实现方法[J].计算机工程,2023,49(9):118-124,136. WANG J T，FAN R，HUANG Z. Fast implementation of high power operation in SM9[J].Computer Engineering,2023,49(9):118-124,136.
[26] Choi E, Park J, Han K, et al. AESware: Developing AES-enabled low-power multicore processors leveraging open RISC-V cores with a shared lightweight AES accelerator[J]. Engineering Science and Technology, an International Journal, 2024, 60: 101894.
[27] 刘恺.一种面向分组密码的微处理器指令扩展技术[D].华中科技大学,2015. Liu Kai. A Microprocessor Instruction Set Extension Technology for Block Cipher[D].Huazhong University of Science and Technology,2015.
[28] 杨锦江.基于可重构计算的密码处理器关键技术研究[D].东南大学,2018. Yang Jinjiang. Research on Key Technologies of Reconfigurable Cryptographic Processors[D].Southeast University,2018.
[29] Zhang Y, Xu L, Dong Q, et al. Recryptor: A reconfigurable cryptographic cortex-M0 processor with in-memory and near-memory computing for IoT security[J]. IEEE Journal of Solid-State Circuits, 2018, 53(4): 995-1005.

选择文件类型/文献管理软件名称

选择包含的内容