A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches

R. Hoare, Zhu Ding, A.K. Jones
{"title":"A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches","authors":"R. Hoare, Zhu Ding, A.K. Jones","doi":"10.1145/1188455.1188554","DOIUrl":null,"url":null,"abstract":"The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N times N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three maximum matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a pure logic scheduler with three dimensions of parallelism, a matrix scheduler with two dimensions of parallelism and a vector scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 times 8 to 256 times 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 times 1024 the scheduling can be completed in approximately 10 s with current technology and could reach under 90 ns with future technologies","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2006 Conference (SC'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1188455.1188554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N times N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three maximum matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a pure logic scheduler with three dimensions of parallelism, a matrix scheduler with two dimensions of parallelism and a vector scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 times 8 to 256 times 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 times 1024 the scheduling can be completed in approximately 10 s with current technology and could reach under 90 ns with future technologies
大基数交叉开关近乎最优的实时硬件调度程序
二部图的最大匹配算法可以为基于交叉棒的互联网络提供最优调度。遗憾的是,对于N × N的通信系统,最大匹配需要O(N3)时间,这限制了其在实时网络调度中的应用。在本文中,我们展示了如何用布尔运算而不是更传统的公式来重新表述最大匹配。通过利用定制硬件设计中固有的并行性,我们介绍了硬件中的三种最大匹配实现,并展示了如何以设计复杂性换取性能。具体来说,我们研究了具有三维并行性的纯逻辑调度程序、具有二维并行性的矩阵调度程序和具有一维并行性的向量调度程序。这些设计将算法的时间复杂度分别降低到O(1)、O(K)和O(KN),其中K为优化步骤数。虽然最优调度算法需要K=2N-1步,但我们的仿真结果表明,当K=9时,调度程序可以实现99%的最优调度。我们检查了这些架构的硬件和时间复杂度,当交叉杆的大小高达N=1024时。利用FPGA合成结果,我们证明了在每个优化步骤不到20 ns的时间内,可以对8 × 8到256 × 256的各种大小的交叉条进行贪婪调度。对于达到1024 × 1024的横梁,目前的技术可以在大约10秒内完成调度,未来的技术可以在90秒内完成调度
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信