使用内存映射网络接口的smp集群上平铺嵌套循环的流水线调度

ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI:10.1109/SC.2002.10008

Maria Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris

{"title":"使用内存映射网络接口的smp集群上平铺嵌套循环的流水线调度","authors":"Maria Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris","doi":"10.1109/SC.2002.10008","DOIUrl":null,"url":null,"abstract":"This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We present a novel, pipelined scheduling approach which takes advantage of DMA communication mode, to send data to other nodes, while the CPUs are performing calculations. We also use zero-copy communication through pinned-down physical memory regions, provided by NIC’s driver modules. Our testbed concerns the parallel execution of tiled nested loops onto a cluster of SMP nodes with single PCI-SCI NICs inside each node. In order to schedule tiles, we apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. Experimental evaluation illustrates that memory mapped NICs with enhanced communication features enable the use of a more advanced pipelined (overlapping) schedule, which considerably improves performance, compared to an ordinary blocking schedule, implemented with conventional, CPU and kernel bounded, communication primitives.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces\",\"authors\":\"Maria Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris\",\"doi\":\"10.1109/SC.2002.10008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We present a novel, pipelined scheduling approach which takes advantage of DMA communication mode, to send data to other nodes, while the CPUs are performing calculations. We also use zero-copy communication through pinned-down physical memory regions, provided by NIC’s driver modules. Our testbed concerns the parallel execution of tiled nested loops onto a cluster of SMP nodes with single PCI-SCI NICs inside each node. In order to schedule tiles, we apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. Experimental evaluation illustrates that memory mapped NICs with enhanced communication features enable the use of a more advanced pipelined (overlapping) schedule, which considerably improves performance, compared to an ordinary blocking schedule, implemented with conventional, CPU and kernel bounded, communication primitives.\",\"PeriodicalId\":302800,\"journal\":{\"name\":\"ACM/IEEE SC 2002 Conference (SC'02)\",\"volume\":\"157 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 2002 Conference (SC'02)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.2002.10008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2002 Conference (SC'02)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2002.10008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

本文描述了使用增强的网络接口实现低延迟通信所获得的性能优势。我们提出了一种新颖的流水线调度方法，它利用DMA通信模式，在cpu执行计算时将数据发送到其他节点。我们也使用零拷贝通信通过固定的物理内存区域，由网卡的驱动模块提供。我们的测试平台涉及在SMP节点集群上并行执行平铺嵌套循环，每个节点中都有单个PCI-SCI网卡。为了对贴片进行调度，我们对贴片空间进行了基于超平面的分组变换，将独立的相邻贴片分组在一起，并将它们分配到同一个SMP节点。实验评估表明，具有增强通信特性的内存映射网卡可以使用更高级的流水线(重叠)调度，与使用传统的、CPU和内核受限的通信原语实现的普通阻塞调度相比，这大大提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces

This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We present a novel, pipelined scheduling approach which takes advantage of DMA communication mode, to send data to other nodes, while the CPUs are performing calculations. We also use zero-copy communication through pinned-down physical memory regions, provided by NIC’s driver modules. Our testbed concerns the parallel execution of tiled nested loops onto a cluster of SMP nodes with single PCI-SCI NICs inside each node. In order to schedule tiles, we apply a hyperplane-based grouping transformation to the tiled space, so as to group together independent neighboring tiles and assign them to the same SMP node. Experimental evaluation illustrates that memory mapped NICs with enhanced communication features enable the use of a more advanced pipelined (overlapping) schedule, which considerably improves performance, compared to an ordinary blocking schedule, implemented with conventional, CPU and kernel bounded, communication primitives.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM/IEEE SC 2002 Conference (SC'02)

自引率

0.00%

发文量