{"title":"Parallel multi-level analytical global placement on graphics processing units","authors":"J. Cong, Yi Zou","doi":"10.1145/1687399.1687525","DOIUrl":null,"url":null,"abstract":"GPU platforms are becoming increasingly attractive for implementing accelerators because they feature a larger number of cores with improved programmability. In this paper, we describe our implementation of a state-of-the-art academic multi-level analytical placer mPL on Nvidia's massively parallel GT200 series platforms. We detail our efforts on performance tuning and optimizations. When compared to software implementation on Intel's recent generation Xeon CPU, the speed of the global placement part of mPL is 15× faster on average using a Tesla C1060 card, with comparable WL. (less than 1% WL degradation on average).","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"244 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1687399.1687525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32
Abstract
GPU platforms are becoming increasingly attractive for implementing accelerators because they feature a larger number of cores with improved programmability. In this paper, we describe our implementation of a state-of-the-art academic multi-level analytical placer mPL on Nvidia's massively parallel GT200 series platforms. We detail our efforts on performance tuning and optimizations. When compared to software implementation on Intel's recent generation Xeon CPU, the speed of the global placement part of mPL is 15× faster on average using a Tesla C1060 card, with comparable WL. (less than 1% WL degradation on average).