Research and Design of Out-of-Order Submission Mechanism for Superscalar Processor

doi:10.19678/j.issn.1000-3428.0057410

Computer Engineering ›› 2021, Vol. 47 ›› Issue (4): 180-186. doi: 10.19678/j.issn.1000-3428.0057410

• Computer Architecture and Software Technology • Previous Articles Next Articles

Research and Design of Out-of-Order Submission Mechanism for Superscalar Processor

LI Zhao¹, LIU Youyao¹, JIAO Jiye², PAN Shupeng¹

1. School of Electronic Engineering, Xi'an University of Posts & Telecommunications, Xi'an 710100, China;
2. School of Computer Science & Technology, Xi'an University of Posts & Telecommunications, Xi'an 710100, China

Received:2020-02-17 Revised:2020-04-02 Published:2020-04-10

超标量处理器乱序提交机制的研究与设计

李昭¹, 刘有耀¹, 焦继业², 潘树朋¹

1. 西安邮电大学电子工程学院, 西安 710100;
2. 西安邮电大学计算机学院, 西安 710100

作者简介:李昭(1993-),男,硕士研究生,主研方向为专用集成电路设计;刘有耀,教授、博士;焦继业,高级工程师、博士;潘树朋,硕士研究生。
基金资助:
国家自然科学基金（61874087，61834005，61634004）。

Abstract

Abstract: To address blocking of Reorder Buffer(ROB) caused by delayed retirement of long-term execution instructions and continuous decoding in the superscalar processors,this paper proposes a mechanism for out-of-order submission of instructions.This mechanism designs a multi-buffer instruction submission structure with configurable capacity to implement classified retirement of memory operation instructions and ALU instructions.Based on the structure and performance requirements of superscalar processors,parameterized configuration is performed on the Target Buffer(TB) capacity and Memory Buffer(MB) capacity to reduce the risk of streamline blocking.In addition,the encoding submission mode of instruction destination register is used to accelerate instruction submission.Experimental results show that the proposed mechanism increases the number of single instruction submissions.The superscalar processor based on the proposed mechanism improves the average IPC index by 46% compared with the traditional ROB-based superscalar processor while the hardware overhead is reduced,and by 19% compared with the superscalar processors based on ratio prediction,out-of-order retirement,and group submission schemes.It has better comprehensive performance.

Key words: superscalar processor, Reorder Buffer(ROB), instruction classification retirement, out-of-order submission, destination register encoding

摘要： 针对超标量处理器中长周期执行指令延迟退休及持续译码导致的重排序缓存（ROB）阻塞问题，提出一种指令乱序提交机制。通过设计容量可配置的多缓存指令提交结构，实现存储器操作指令和ALU类型指令的分类退休，根据超标量处理器架构及性能需求对目标缓存和存储缓存容量进行参数化配置降低流水线阻塞风险，同时利用指令目的寄存器编码提交模式加快指令提交速率。实验结果表明，该机制提高了单次指令提交数量，基于该机制的超标量处理器相比传统基于ROB顺序提交机制的超标量处理器在减少硬件开销的情况下平均IPC指数提升46%，相比基于值预测、乱序退休和组提交的超标量处理器平均IPC指数增益为19%，综合性能更优。

关键词: 超标量处理器, 重排序缓存, 指令分类退休, 乱序提交, 目的寄存器编码

CLC Number:

TP338

LI Zhao, LIU Youyao, JIAO Jiye, PAN Shupeng. Research and Design of Out-of-Order Submission Mechanism for Superscalar Processor[J]. Computer Engineering, 2021, 47(4): 180-186.

李昭, 刘有耀, 焦继业, 潘树朋. 超标量处理器乱序提交机制的研究与设计[J]. 计算机工程, 2021, 47(4): 180-186.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0057410

http://www.ecice06.com/EN/Y2021/V47/I4/180

Figures/Tables 7

References

[1] SMITH J E,PLESZKUN A R.Implementing precise interrupts in pipelined processors[J].IEEE Transactions on Computers,1988,37(5):562-573.
[2] CRISTAL A.Kilo-instruction processors:overcoming the memory wall[J].IEEE Micro,2005,25(3):48-57.
[3] KARKHANIS T,SMITH J E,BOSE P.Saving energy with just in time instruction delivery[C]//Proceedings of International Symposium on Low Power Electronics and Design.Washington D.C.,USA:IEEE Press,2002:178-183.
[4] PALACHARLA S,JOUPPI N P,SMITH J E.Complexity-effective superscalar processors[C]//Proceedings of the 24th Annual International Symposium on Computer Architecture.Washington D.C.,USA:IEEE Press,1997:206-218.
[5] MUTLU O,STARK J,WILKERSON C,et al.Runahead execution:an alternative to very large instruction windows for out-of-order processors[C]//Proceedings of the 9th International Symposium on High-Performance Computer Architecture.Washington D.C.,USA:IEEE Press,2003:129-140.
[6] PETIT S,UBAL R,SAHUQUILLO J,et al.An efficient low-complexity alternative to the ROB for out-of-order retirement of instructions[C]//Proceedings of the 12th Euromicro Conference on Digital System Design,Architectures,Methods and Tools.Washington D.C.,USA:IEEE Press,2009:635-642.
[7] AFRAM F,ZENG H,GHOSE K.A group-commit mechanism for ROB based processors implementing the X86 ISA[C]//Proceedings of the 19th International Symposium on High Performance Computer Architecture.Washington D.C.,USA:IEEE Press,2013:47-58.
[8] BALASUBRAMONIAN R,DWARKADAS S.Reducing the complexity of the register file in dynamic superscalar processors[C]//Proceedings of the 34th ACM/IEEE Inter-national Symposium on Microarchitecture.Washington D.C.,USA:IEEE Press,2001:237-248.
[9] SUN Caixia,SUI Bingcai,WANG Lei,et al.Counters based performance analysis and optimization of an out-of-order superscalar processor core[J].Journal of National University of Defense Technology,2016,38(5):14-19.(in Chinese)孙彩霞,隋兵才,王蕾,等.乱序超标量处理器核的性能分析与优化[J].国防科技大学学报,2016,38(5):14-19.
[10] MARTI S P,BORRAS J S,RODRIGUZE P L,et al.A complexity-effective out-of-order retirement micro-architecture[J].IEEE Transactions on Computers,2009,58(12):1626-1639.
[11] PONOMAREV D,KUCUK G,GHOSE K.Dynamic resizing of superscalar datapath components for energy efficiency[J].IEEE Transactions on Computers,2006,55(2):199-213.
[12] LI Cunlu,DONG Dezun,LU Zhonghai,et al.ROB-Router:a reorder buffer enabled low latency network-on-chip router[J].IEEE Transactions on Parallel and Distributed Systems,2018,29(9):2090-2104.
[13] ZHANG S Z,WRIGHT A,BOURGEAT T,et al.Com-posable building blocks to open up processor design[C]//Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture.Washington D.C.,USA:IEEE Press,2018:68-81.
[14] SARTORI M L L,CALAZANS N L V.Go functional model for a RISC-V asynchronous organization-ARV[C]//Proceedings of the 24th IEEE International Conference on Electronics,Circuits and Systems.Washington D.C.,USA:IEEE Press,2017:381-348.
[15] LI Cunlu,DONG Dezun,LIAO Xingke,et al.ROB-Router:low latency network-on-chip router micro-architecture using reorder buffer[C]//Proceedings of the 24th Annual Symposium on High-Performance Inter-connects.Washington D.C.,USA:IEEE Press,2016:68-75.
[16] BELL G B,LIPASTI M H.Deconstructing commit[C]//Proceedings of IEEE International Symposium on Perfor-mance Analysis of Systems and Software.Washington D.C.,USA:IEEE Press,2004:68-77.
[17] LEE K,JEONG I,RO W W.Parallel in-order execution architecture for low-power processor[C]//Proceedings of International SoC Design Conference.Washington D.C.,USA:IEEE Press,2017:65-66.
[18] KUCUK G,PONOMAREV D V,ERGIN O,et al.Complexity-effective reorder buffer designs for superscalar processors[J].IEEE Transactions on Computers,2004,53(6):653-665.
[19] XI S L,JACOBSON H,BOSE P,et al.Quantifying sources of error in McPAT and potential impacts on architectural studies[C]//Proceedings of the 21st International Sym-posium on High Performance Computer Architecture.Washington D.C.,USA:IEEE Press,2015:577-589.
[20] JEONG I,LEE C,KIM K,et al.OverCome:coarse-grained instruction commit with handover register renaming[J].IEEE Transactions on Computers,2019,68(12):1802-1816.
[21] DAVID W W.Limits of instruction-level parallelism[C]//Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems.New York,USA:ACM Press,1991:176-188.

Please choose a citation manager

Content to export