Timing aware partitioning for multi-FPGA based logic simulation using top-down selective hierarchy flattening

2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI:10.1109/ICCD.2012.6378634

S. Swaminathan, P. Lin, S. Khatri

{"title":"Timing aware partitioning for multi-FPGA based logic simulation using top-down selective hierarchy flattening","authors":"S. Swaminathan, P. Lin, S. Khatri","doi":"10.1109/ICCD.2012.6378634","DOIUrl":null,"url":null,"abstract":"In order to accelerate logic simulation, it is highly beneficial to simulate the circuit design on FPGA hardware. This is often referred to as emulation, and we use the terms simulation and emulation interchangeably in this paper. However, limited hardware on FPGAs prevents large designs from being implemented on a single FPGA. Hence there is a need to partition the design and simulate it on a multi-FPGA platform. In contrast to existing FPGA-based post-synthesis partitioning approaches which first completely flatten the circuit and then possibly perform bottom-up clustering, we perform a selective top-down flattening and thereby avoid the potential netlist blowup. This also allows us to preserve the design hierarchy to guide the partitioning and to make subsequent debugging easier. Our approach analyzes the hierarchical design and selectively flattens instances using two metrics based on slack. The resulting partially flattened netlist is converted to a hypergraph, partitioned using hMetis, and reconverted back to a plurality of FPGA netlists, one for each FPGA of the FPGA-based accelerated logic simulation platform. We compare our approach with a partitioning approach that operates on a completely flattened netlist. Static timing analysis was performed for both approaches, and over 15 large examples from the OpenCores project, our approach yields a 52% logic simulation speedup and about 0.74× runtime for the entire flow, compared to the completely flat approach. The entire tool chain of our approach is automated in an end-to-end flow from hierarchy extraction, selective flattening, partitioning, and netlist reconstruction. Compared to an existing method which also performs slack-based partitioning of a hierarchical netlist, we obtain a 35% simulation speedup. Our method scales very well, yielding a significantly better simulation speedup and runtime improvement for larger examples.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 30th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2012.6378634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In order to accelerate logic simulation, it is highly beneficial to simulate the circuit design on FPGA hardware. This is often referred to as emulation, and we use the terms simulation and emulation interchangeably in this paper. However, limited hardware on FPGAs prevents large designs from being implemented on a single FPGA. Hence there is a need to partition the design and simulate it on a multi-FPGA platform. In contrast to existing FPGA-based post-synthesis partitioning approaches which first completely flatten the circuit and then possibly perform bottom-up clustering, we perform a selective top-down flattening and thereby avoid the potential netlist blowup. This also allows us to preserve the design hierarchy to guide the partitioning and to make subsequent debugging easier. Our approach analyzes the hierarchical design and selectively flattens instances using two metrics based on slack. The resulting partially flattened netlist is converted to a hypergraph, partitioned using hMetis, and reconverted back to a plurality of FPGA netlists, one for each FPGA of the FPGA-based accelerated logic simulation platform. We compare our approach with a partitioning approach that operates on a completely flattened netlist. Static timing analysis was performed for both approaches, and over 15 large examples from the OpenCores project, our approach yields a 52% logic simulation speedup and about 0.74× runtime for the entire flow, compared to the completely flat approach. The entire tool chain of our approach is automated in an end-to-end flow from hierarchy extraction, selective flattening, partitioning, and netlist reconstruction. Compared to an existing method which also performs slack-based partitioning of a hierarchical netlist, we obtain a 35% simulation speedup. Our method scales very well, yielding a significantly better simulation speedup and runtime improvement for larger examples.

查看原文本刊更多论文

基于时序感知的多fpga逻辑仿真分区，采用自顶向下的选择分层扁平化

为了加快逻辑仿真，在FPGA硬件上对电路设计进行仿真是非常有益的。这通常被称为仿真，我们在本文中交替使用仿真和仿真这两个术语。然而，FPGA上有限的硬件阻碍了在单个FPGA上实现大型设计。因此，有必要对设计进行分区，并在多fpga平台上进行仿真。现有的基于fpga的合成后划分方法首先将电路完全平坦，然后可能执行自下而上的聚类，与之相反，我们执行选择性的自上而下的平坦，从而避免潜在的网表爆炸。这还允许我们保留设计层次结构来指导分区，并使后续调试更容易。我们的方法分析了分层设计，并使用基于slack的两个指标选择性地平坦化实例。所得到的部分扁平的网络列表被转换为超图，使用hMetis进行分区，并重新转换回多个FPGA网络列表，每个FPGA用于基于FPGA的加速逻辑仿真平台的FPGA。我们将我们的方法与在完全扁平的网表上操作的分区方法进行比较。我们对这两种方法进行了静态计时分析，在OpenCores项目的15个大型示例中，与完全扁平化的方法相比，我们的方法产生了52%的逻辑模拟加速，整个流程的运行时间约为0.74倍。我们方法的整个工具链在一个端到端的流程中自动化，包括层次提取、选择性平坦化、分区和网络列表重建。与现有的基于松弛的分层网表划分方法相比，我们的模拟速度提高了35%。我们的方法可以很好地扩展，对于较大的示例，可以产生更好的模拟加速和运行时改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 30th International Conference on Computer Design (ICCD)

自引率

0.00%

发文量