多核处理器NoC的保证服务

Network on Chip Architectures Pub Date : 2014-12-13 DOI:10.1145/2685342.2685344

B. Dinechin, Y. Durand, D. V. Amstel, Alexandre Ghiti

{"title":"多核处理器NoC的保证服务","authors":"B. Dinechin, Y. Durand, D. V. Amstel, Alexandre Ghiti","doi":"10.1145/2685342.2685344","DOIUrl":null,"url":null,"abstract":"The Kalray MPPA®-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronization are supported by an explicitly routed dual data & control network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem, for a total of 32 nodes. The data NoC is dedicated to streaming data transfers and may operate with guaranteed services, thanks to non-blocking routers and flow regulation at the source node. Its architecture has been designed so that (σ, ρ) network calculus applies with minimal approximations.\n Given a set of flows across this data NoC with predetermined routes, we formulate the problem of guaranteeing fair allocation of bandwidth across flows and we present bounds on the maximum transfer latency. By considering the architecture of the data NoC and by introducing conservative approximations, we show how this formulation can be transformed into a linear program. Solving this linear program is efficient and the quality of its solutions appears comparable to those of the original formulation, based on problem instances obtained from the cyclostatic dataflow compilation toolchain of the Kalray MPPA®-256 processor.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":"{\"title\":\"Guaranteed Services of the NoC of a Manycore Processor\",\"authors\":\"B. Dinechin, Y. Durand, D. V. Amstel, Alexandre Ghiti\",\"doi\":\"10.1145/2685342.2685344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Kalray MPPA®-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronization are supported by an explicitly routed dual data & control network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem, for a total of 32 nodes. The data NoC is dedicated to streaming data transfers and may operate with guaranteed services, thanks to non-blocking routers and flow regulation at the source node. Its architecture has been designed so that (σ, ρ) network calculus applies with minimal approximations.\\n Given a set of flows across this data NoC with predetermined routes, we formulate the problem of guaranteeing fair allocation of bandwidth across flows and we present bounds on the maximum transfer latency. By considering the architecture of the data NoC and by introducing conservative approximations, we show how this formulation can be transformed into a linear program. Solving this linear program is efficient and the quality of its solutions appears comparable to those of the original formulation, based on problem instances obtained from the cyclostatic dataflow compilation toolchain of the Kalray MPPA®-256 processor.\",\"PeriodicalId\":344147,\"journal\":{\"name\":\"Network on Chip Architectures\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"62\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network on Chip Architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2685342.2685344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network on Chip Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2685342.2685344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 62

摘要

Kalray MPPA®-256处理器(多用途处理阵列)在单个28nm CMOS芯片上集成了256个处理引擎(PE)内核和32个资源管理(RM)内核。这些核心分布在16个计算集群和4个I/O子系统中。片上通信和同步由明确路由的双数据和控制片上网络(NoC)支持，每个计算集群有一个节点，每个I/O子系统有4个节点，总共有32个节点。数据NoC专用于流数据传输，由于源节点的非阻塞路由器和流量调节，可以保证服务的运行。它的结构设计使(σ， ρ)网络演算适用于最小逼近。给定一组具有预定路由的数据流通过该数据NoC，我们制定了保证跨流公平分配带宽的问题，并给出了最大传输延迟的界限。通过考虑数据NoC的体系结构并引入保守近似，我们展示了如何将该公式转换为线性程序。求解这个线性程序是有效的，其解决方案的质量似乎与原始配方相当，基于从Kalray MPPA®-256处理器的循环静态数据流编译工具链获得的问题实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Guaranteed Services of the NoC of a Manycore Processor

The Kalray MPPA®-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronization are supported by an explicitly routed dual data & control network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem, for a total of 32 nodes. The data NoC is dedicated to streaming data transfers and may operate with guaranteed services, thanks to non-blocking routers and flow regulation at the source node. Its architecture has been designed so that (σ, ρ) network calculus applies with minimal approximations. Given a set of flows across this data NoC with predetermined routes, we formulate the problem of guaranteeing fair allocation of bandwidth across flows and we present bounds on the maximum transfer latency. By considering the architecture of the data NoC and by introducing conservative approximations, we show how this formulation can be transformed into a linear program. Solving this linear program is efficient and the quality of its solutions appears comparable to those of the original formulation, based on problem instances obtained from the cyclostatic dataflow compilation toolchain of the Kalray MPPA®-256 processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Network on Chip Architectures

自引率

0.00%

发文量