A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches

ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI:10.1145/1188455.1188554

R. Hoare, Zhu Ding, A.K. Jones

{"title":"A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches","authors":"R. Hoare, Zhu Ding, A.K. Jones","doi":"10.1145/1188455.1188554","DOIUrl":null,"url":null,"abstract":"The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N times N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three maximum matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a pure logic scheduler with three dimensions of parallelism, a matrix scheduler with two dimensions of parallelism and a vector scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 times 8 to 256 times 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 times 1024 the scheduling can be completed in approximately 10 s with current technology and could reach under 90 ns with future technologies","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2006 Conference (SC'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1188455.1188554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N times N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three maximum matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a pure logic scheduler with three dimensions of parallelism, a matrix scheduler with two dimensions of parallelism and a vector scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 times 8 to 256 times 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 times 1024 the scheduling can be completed in approximately 10 s with current technology and could reach under 90 ns with future technologies

查看原文本刊更多论文

大基数交叉开关近乎最优的实时硬件调度程序

二部图的最大匹配算法可以为基于交叉棒的互联网络提供最优调度。遗憾的是，对于N × N的通信系统，最大匹配需要O(N3)时间，这限制了其在实时网络调度中的应用。在本文中，我们展示了如何用布尔运算而不是更传统的公式来重新表述最大匹配。通过利用定制硬件设计中固有的并行性，我们介绍了硬件中的三种最大匹配实现，并展示了如何以设计复杂性换取性能。具体来说，我们研究了具有三维并行性的纯逻辑调度程序、具有二维并行性的矩阵调度程序和具有一维并行性的向量调度程序。这些设计将算法的时间复杂度分别降低到O(1)、O(K)和O(KN)，其中K为优化步骤数。虽然最优调度算法需要K=2N-1步，但我们的仿真结果表明，当K=9时，调度程序可以实现99%的最优调度。我们检查了这些架构的硬件和时间复杂度，当交叉杆的大小高达N=1024时。利用FPGA合成结果，我们证明了在每个优化步骤不到20 ns的时间内，可以对8 × 8到256 × 256的各种大小的交叉条进行贪婪调度。对于达到1024 × 1024的横梁，目前的技术可以在大约10秒内完成调度，未来的技术可以在90秒内完成调度

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM/IEEE SC 2006 Conference (SC'06)

自引率

0.00%

发文量