{"title":"Forward-scaling, serially equivalent parallelism for FPGA placement","authors":"C. Fobel, G. Grewal, D. Stacey","doi":"10.1145/2591513.2591543","DOIUrl":null,"url":null,"abstract":"Placement run-times continue to dominate the FPGA design flow. Previous attempts at parallel placement methods either only scale to a few threads or result in a significant loss in solution quality as thread-count is increased. We propose a novel method for generating large amounts of parallel work for placement, which scales with the size of the target architecture. Our experimental results show that we nearly reach the limit of the number of possible parallel swaps, while improving critical-path-delay 4.7% compared to VPR. While our proposed implementation currently utilizes a single thread, we still achieve speedups of 13.3x over VPR.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Great Lakes Symposium on VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2591513.2591543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Placement run-times continue to dominate the FPGA design flow. Previous attempts at parallel placement methods either only scale to a few threads or result in a significant loss in solution quality as thread-count is increased. We propose a novel method for generating large amounts of parallel work for placement, which scales with the size of the target architecture. Our experimental results show that we nearly reach the limit of the number of possible parallel swaps, while improving critical-path-delay 4.7% compared to VPR. While our proposed implementation currently utilizes a single thread, we still achieve speedups of 13.3x over VPR.