{"title":"GPU Approach to FPGA placement based on star+","authors":"C. Fobel, G. Grewal, Robert Collier, D. Stacey","doi":"10.1109/NEWCAS.2012.6328998","DOIUrl":null,"url":null,"abstract":"While simulated-annealing is currently the most widely used method for performing FPGA placement, it does not scale to very large designs. Modern many-core architectures (including GPUs) offer a promising alternative to traditional multi-core processors for improving runtime performance. In this work, we propose a GPU-accelerated simulated-annealing variant for FPGA placement. Our approach uses the Star+ wirelength model along with a novel method of efficiently generating large sets of independent swap operations, providing a high level of parallelism. Speedups from 5.4-89.2× (median 20.2×) were achieved over a single-core CPU-only implementation.","PeriodicalId":122918,"journal":{"name":"10th IEEE International NEWCAS Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th IEEE International NEWCAS Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEWCAS.2012.6328998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
While simulated-annealing is currently the most widely used method for performing FPGA placement, it does not scale to very large designs. Modern many-core architectures (including GPUs) offer a promising alternative to traditional multi-core processors for improving runtime performance. In this work, we propose a GPU-accelerated simulated-annealing variant for FPGA placement. Our approach uses the Star+ wirelength model along with a novel method of efficiently generating large sets of independent swap operations, providing a high level of parallelism. Speedups from 5.4-89.2× (median 20.2×) were achieved over a single-core CPU-only implementation.