Packet Switched vs. Time Multiplexed FPGA Overlay Networks

Nachiket Kapre, Nikil Mehta, Michael DeLorimier, Raphael Rubin, Henry Barnor, M. J. Wilson, M. Wrighton, A. DeHon
{"title":"Packet Switched vs. Time Multiplexed FPGA Overlay Networks","authors":"Nachiket Kapre, Nikil Mehta, Michael DeLorimier, Raphael Rubin, Henry Barnor, M. J. Wilson, M. Wrighton, A. DeHon","doi":"10.1109/FCCM.2006.55","DOIUrl":null,"url":null,"abstract":"Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE's operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2times faster for small area designs while time-multiplexing is up to 5times faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing","PeriodicalId":123057,"journal":{"name":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"150","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2006.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 150

Abstract

Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE's operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2times faster for small area designs while time-multiplexing is up to 5times faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing
分组交换与时间复用FPGA覆盖网络
专用的、空间配置的FPGA互连对于需要处理元素(PE)之间高吞吐量连接但PE互连程度有限(例如连接门和数据路径)的应用是有效的。虚拟化PE的应用程序可能需要大量不同的PE到PE连接(例如,使用一个PE来模拟100个运营商,每个运营商都需要来自数千个其他运营商的输入数据),但是与PE的操作周期时间相比,每个连接的吞吐量都很低。在这些高度互联的条件下,将空间互联资源用于所有可能的连接是昂贵且低效的。或者,我们可以通过虚拟化互连链路来分时共享物理网络资源,或者在运行时之前静态地调度资源共享,或者在运行时动态地协商资源。我们探讨了覆盖在商品fpga之上的时间复用和分组交换网络之间的权衡(例如面积,路由延迟,路由质量)。我们演示了模块化和可扩展的网络,该网络在Xilinx XC2V6000-4上运行,频率为166MHz。对于我们的应用程序,时间复用,离线调度提供了高达63%的性能提高比在线,包交换调度等效拓扑。当将设计应用于等效面积时,小面积设计的分组交换速度可达2倍,而大面积设计的时间复用速度可达5倍。当受限于XC2V6000的容量时,如果所有通信都是已知的,则时间复用路由优于分组交换;但是,当活动链路集下降到潜在链路的40%以下时,分组交换路由可以优于时间复用
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信