优化了基于网格的NoC多处理器的Reduce

A. Kohler, M. Radetzki
{"title":"优化了基于网格的NoC多处理器的Reduce","authors":"A. Kohler, M. Radetzki","doi":"10.1109/IPDPSW.2012.111","DOIUrl":null,"url":null,"abstract":"Future processors are expected to be made up of a large number of computation cores interconnected by fast on-chip networks (Network-on-Chip, NoC). Such distributed structures motivate the use of message passing programming models similar to MPI. Since the properties of these networks, like e.g. the topology, are known and fixed after production, this knowledge can be used to optimize the communication stack. We describe two schemes that take advantage of this to accelerate the (All-)Reduce operation defined in MPI, namely a contention avoiding rank-to-core mapping and a way of interleaving communication and computation by means of pipelining. Simulations show that the combination of both schemes can accelerate (All-)Reduce operations by more than 60%.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Optimized Reduce for Mesh-Based NoC Multiprocessors\",\"authors\":\"A. Kohler, M. Radetzki\",\"doi\":\"10.1109/IPDPSW.2012.111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Future processors are expected to be made up of a large number of computation cores interconnected by fast on-chip networks (Network-on-Chip, NoC). Such distributed structures motivate the use of message passing programming models similar to MPI. Since the properties of these networks, like e.g. the topology, are known and fixed after production, this knowledge can be used to optimize the communication stack. We describe two schemes that take advantage of this to accelerate the (All-)Reduce operation defined in MPI, namely a contention avoiding rank-to-core mapping and a way of interleaving communication and computation by means of pipelining. Simulations show that the combination of both schemes can accelerate (All-)Reduce operations by more than 60%.\",\"PeriodicalId\":378335,\"journal\":{\"name\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2012.111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2012.111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

未来的处理器预计将由大量的计算核心组成,这些计算核心通过快速的片上网络(片上网络,NoC)相互连接。这种分布式结构鼓励使用类似于MPI的消息传递编程模型。由于这些网络的属性(例如拓扑)在生产后是已知和固定的,因此可以使用这些知识来优化通信堆栈。我们描述了两种利用这一点来加速MPI中定义的(All-)Reduce操作的方案,即避免秩到核映射的争用和通过管道交错通信和计算的方法。仿真结果表明,两种方案的组合可使(All-)Reduce运算速度提高60%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimized Reduce for Mesh-Based NoC Multiprocessors
Future processors are expected to be made up of a large number of computation cores interconnected by fast on-chip networks (Network-on-Chip, NoC). Such distributed structures motivate the use of message passing programming models similar to MPI. Since the properties of these networks, like e.g. the topology, are known and fixed after production, this knowledge can be used to optimize the communication stack. We describe two schemes that take advantage of this to accelerate the (All-)Reduce operation defined in MPI, namely a contention avoiding rank-to-core mapping and a way of interleaving communication and computation by means of pipelining. Simulations show that the combination of both schemes can accelerate (All-)Reduce operations by more than 60%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信