面向MPI集合操作的定制化片上网络

doi:10.3969/j.issn.1000-3428.2017.06.001

计算机工程

面向MPI集合操作的定制化片上网络

陆思羽,王宏伟,张悠慧,杨广文,郑纬民

(清华大学计算机科学与技术系,北京 100084)

收稿日期:2016-05-16 出版日期:2017-06-15 发布日期:2017-06-15
作者简介:陆思羽(1990—),女,硕士研究生,主研方向为计算机体系结构;王宏伟,硕士;张悠慧、杨广文、郑纬民,教授。
基金资助:
国家“863”计划项目(2013AA01A215)。

Customized Network-on-Chip Oriented to MPI Collective Operations

LU Siyu,WANG Hongwei,ZHANG Youhui,YANG Guangwen,ZHENG Weimin

(Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)

Received:2016-05-16 Online:2017-06-15 Published:2017-06-15

摘要/Abstract

摘要： 根据计算趋近数据的原则,提出面向MPI集合操作的定制化片上网络设计方法,通过增强现有片上路由器的硬件功能实现MPI集合操作在网络层的加速。设计MPI归约操作,将其扩展至多种集合操作,并与一种针对确定性路由算法且可动态学习消息传输路径的自适应方法相结合,使集合操作可在扩展后的片上路由器上完成,加速处理过程并减少处理器核负载。此外,提出片上路由器的微体系结构设计方法,比较不同片上网络中扩展后的片上路由器布局并评估相应性能、功耗和片上面积。测试结果表明,与基于软件的最优实现相比,该方法在仅消耗有限功耗与片上面积的基础上,可使MPI归约性能提升6.4~41.7倍,广播性能提升15.3~31.2倍,全局归约性能提升5.4~9.7倍,收集性能提升1.3~1.8倍。

关键词: 片上网络, 片上多核处理器, 消息传递接口, 集合操作, 定制化

Abstract: According to the principle of computations approaching data,this paper proposes a design method of Network-on-Chip(NoC) oriented to MPI collective operations,which focuses on the hardware enhancement of common NoC routers to speed up MPI collective operations on the network layer.It designs MPI reduction,extends it to support more operations and combines it with an adaptive method for the deterministic routing algorithm,which can learn transmission paths of messages dynamically.Thus,enhanced routers can complete message processing in place,which not only speed up the processing procedure but also coalesce messages.The design method for detailed micro-architecture of NoC is presented.Different layout strategies of enhanced routers are compared and the corresponding performance,power consumption and extra chip-area are evaluated.Testing results show that,compared with ideal software-based implementation,the proposed method can improve the reduction performance by 6.4~41.7 times,broadcast by 15.3~31.2,global reduction by 5.4~9.7 times,and gather by 1.3~1.8 times,while the consumption of power and chip-area is limited.

Key words: Network-on-Chip(NoC), Chip Multi-Processor(CMP), Message Passing Interface(MPI), collective operation, customization

中图分类号:

TP393

陆思羽,王宏伟,张悠慧,杨广文,郑纬民. 面向MPI集合操作的定制化片上网络[J]. 计算机工程.

LU Siyu,WANG Hongwei,ZHANG Youhui,YANG Guangwen,ZHENG Weimin. Customized Network-on-Chip Oriented to MPI Collective Operations[J]. Computer Engineering.

/ 推荐 / 导出引用

链接本文: https://www.ecice06.com/CN/Y2017/V43/I6/1

https://www.ecice06.com/CN/Y2017/V43/I6/1

参考文献

参考文献［1］Rakesh K,Timothy G M,Gilles P,et al.The Case for Message Passing on Many-core Chips［M］//Hübner M,Becker J.Multiprocessor System-on- Chip.Berlin,Germany:Springer,2011:115-123. ［2］Dong Yong,Chen Juan,Yang Xuejun,et al.Low Power Optimization for MPI Collective Operations［C］//Proceedings of the 9th International Conference for Young Computer Scientists.Washington D.C.,USA:IEEE Press,2008:1047-1052. ［3］Rabenseifner R.Optimization of Collective Reduction Operations［M］//Bubak M,van Albada G D,Sloot P M A,et al.Computational Science-ICCS 2004.Berlin,Germany:Springer,2004:1-9. ［4］Huang Libo,Wang Zhiying,Xiao Nong.Accelerating NoC-based MPI Primitives via Communication Architecture Customization［C］//Proceedings of the 23rd IEEE International Conference on Application-specific Systems,Architectures and Processors.Washington D.C.,USA:IEEE Press,2012:141-148. ［5］Peng Yuanxi,Saldaa M,Chow P.Hardware Support for Broadcast and Reduce in MPSoC［C］//Proceedings of 2011 International Conference on Field Programmable Logic and Applications.Washington D.C.,USA:IEEE Press,2011:144-150. (下转第18页) (上接第10页) ［6］Krishna T,Peh L S.Single-cycle Collective Communication over a Shared Network Fabric［C］//Proceedings of the 8th IEEE/ACM International Symposium on Networks-on-Chip.Washington D.C.,USA:IEEE Press,2014:1-8. ［7］Jerger N E.Peh L,Lipasti M.Virtual Circuit Tree Multicasting:A Case for On-chip Hardware Multicast Support［C］//Proceedings of 2008 International Symposium on Computer Architecture.Washington D.C.,USA:IEEE Press,2008:229-240. ［8］Rodrigo S,Flich J,Duato J.Efficient Unicast and Multicast Support for CMPs［C］//Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture.Washington D.C.,USA:IEEE Press,2008:364-375. ［9］Abad P,Puente V,Gregorio J.MRR:Enabling Fully Adaptive Multicast Routing for CMP Interconnection Networks［C］//Proceedings of the 15th IEEE International Symposium on High Performance Computer Architecture.Washington D.C.,USA:IEEE Press,2009:355-366. ［10］Krishna T,Peh L,Beckmann B M,et al.Towards the Ideal On-chip Fabric for 1-to-Many and Many-to-1 Communica-tion［C］//Proceedings of the 44th Annual IEEE/ACM Inter-national Symposium on Microarchitecture.Washington D.C.,USA:IEEE Press,2011:71-82. ［11］马胜.Cache一致性片上网络路由算法和流控机制优化关键技术研究［D］.长沙:国防科学技术大学,2012. ［12］Ma Sheng,Jerger N E,Wang Zhiying.Supporting Efficient Collective Communication in NoCs［C］//Proceedings of IEEE International Symposium on High-performance Computer Architecture.Washington D.C.,USA:IEEE Press,2012:1-12. ［13］Chan E W,Heimlich M F,Purkayastha A,et al.On Optimizing Collective Communication［C］//Proceedings of 2004 IEEE International Conference on Cluster Computing.Washington D.C.,USA:IEEE Press,2004:145-155. ［14］Gonzalez R E.Xtensa:A Configurable and Extensible Processor［J］.IEEE Micro,2000,20(2):60-70. ［15］Thakur R,Rabenseifner R,Gropp W.Optimization of Collective Communication Operations in MPICH［J］.Journal of High Performance Computing Applications,2005,19(1):49-66. 编辑金胡考

[1]	夏立斌, 刘晓宇, 姜晓巍, 孙功星. 基于分布式数据集的并行计算框架内存优化方法[J]. 计算机工程, 2023, 49(4): 43-51.
[2]	李博, 黄东强, 贾金芳, 吴利, 王晓英, 黄建强. 基于CPU与GPU的异构模板计算优化研究[J]. 计算机工程, 2023, 49(4): 131-137.
[3]	刘康, 万伟, 刘波, 李俊宏, 李柱. 基于“嵩山”超级计算机的UCX库分析与优化[J]. 计算机工程, 2023, 49(12): 274-281.
[4]	杨周凡, 韩林, 李冰洋, 谢景明, 韩璞, 刘勇杰. 基于“嵩山”超级计算机系统的大规模管网仿真[J]. 计算机工程, 2022, 48(9): 155-161.
[5]	杨天浩, 孙晋. 基于虚拟冲突阵列的片上网络路由单元设计[J]. 计算机工程, 2019, 45(7): 54-59.
[6]	孙美东, 刘勤让, 刘崇阳. 基于区域划分的非全互连3D NoC多播路由算法[J]. 计算机工程, 2019, 45(10): 57-63.
[7]	汪涟,朱珂,赵博. 基于片上网络的低偏转率微缓存路由器[J]. 计算机工程, 2017, 43(2): 137-143.
[8]	金星,荆明娥,曾晓洋. 基于异构通信机制的任务映射算法[J]. 计算机工程, 2017, 43(12): 51-54,59.
[9]	姜奎,韩国栋,沈剑良. 一种基于分层Mesh网络的层次化NoC拓扑结构[J]. 计算机工程, 2017, 43(10): 1-5.
[10]	吕天航,刘勤让,赵博. 基于贪心算法的3D-Mesh片上网络层间互联结构[J]. 计算机工程, 2016, 42(9): 52-57.
[11]	彭毅,安虹,金旭,程亦超,迟孟贤,孙荪. 基于分布式模拟机制的片上网络硬件模拟系统[J]. 计算机工程, 2016, 42(5): 71-79.
[12]	冯杰,荆明娥,虞志益. 基于双层片上网络的路由算法[J]. 计算机工程, 2016, 42(4): 83-87.
[13]	孙利,田进华. 片上网络中基于拥塞感知的自适应路由算法[J]. 计算机工程, 2015, 41(8): 82-88.
[14]	吕兴胜,李光顺,吴俊华. 基于多目标免疫算法的NoC 映射优化[J]. 计算机工程, 2015, 41(4): 316-321.
[15]	吴建宇,彭蔓蔓. 面向多线程应用的片上多核处理器私有LLC优化[J]. 计算机工程, 2015, 41(1): 316-321.

选择文件类型/文献管理软件名称

选择包含的内容

面向MPI集合操作的定制化片上网络

Customized Network-on-Chip Oriented to MPI Collective Operations

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

面向MPI集合操作的定制化片上网络

Customized Network-on-Chip Oriented to MPI Collective Operations

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价