Ping Xiang, Yi Yang, Mike Mantor, Norman Rubin, Huiyang Zhou
{"title":"面向吞吐量的GPGPU架构的ILP设计重述","authors":"Ping Xiang, Yi Yang, Mike Mantor, Norman Rubin, Huiyang Zhou","doi":"10.1109/CCGRID.2015.14","DOIUrl":null,"url":null,"abstract":"Many-core architectures such as graphics processing units (GPUs) rely on thread-level parallelism (TLP)to overcome pipeline hazards. Consequently, each core in a many-core processor employs a relatively simple in-order pipeline with limited capability to exploit instruction-level parallelism (ILP). In this paper, we study the ILP impact on the throughput-oriented many-core architecture, including data bypassing, score boarding and branch prediction. We show that these ILP techniques significantly reduce the performance dependency on TLP. This is especially useful for applications, whose resource usage limits the hardware to run a high number of threads concurrently. Furthermore, ILP techniques reduce the demand on on-chip resource to support high TLP. Given the workload-dependent impact from ILP, we propose heterogeneous GPGPU architecture, consisting of both the cores designed for high TLP and those customized with ILPtechniques. Our results show that our heterogeneous GPUarchitecture achieves high throughput as well as high energy and area-efficiency compared to homogenous designs.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"20 1","pages":"121-130"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture\",\"authors\":\"Ping Xiang, Yi Yang, Mike Mantor, Norman Rubin, Huiyang Zhou\",\"doi\":\"10.1109/CCGRID.2015.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many-core architectures such as graphics processing units (GPUs) rely on thread-level parallelism (TLP)to overcome pipeline hazards. Consequently, each core in a many-core processor employs a relatively simple in-order pipeline with limited capability to exploit instruction-level parallelism (ILP). In this paper, we study the ILP impact on the throughput-oriented many-core architecture, including data bypassing, score boarding and branch prediction. We show that these ILP techniques significantly reduce the performance dependency on TLP. This is especially useful for applications, whose resource usage limits the hardware to run a high number of threads concurrently. Furthermore, ILP techniques reduce the demand on on-chip resource to support high TLP. Given the workload-dependent impact from ILP, we propose heterogeneous GPGPU architecture, consisting of both the cores designed for high TLP and those customized with ILPtechniques. Our results show that our heterogeneous GPUarchitecture achieves high throughput as well as high energy and area-efficiency compared to homogenous designs.\",\"PeriodicalId\":6664,\"journal\":{\"name\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"20 1\",\"pages\":\"121-130\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2015.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2015.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture
Many-core architectures such as graphics processing units (GPUs) rely on thread-level parallelism (TLP)to overcome pipeline hazards. Consequently, each core in a many-core processor employs a relatively simple in-order pipeline with limited capability to exploit instruction-level parallelism (ILP). In this paper, we study the ILP impact on the throughput-oriented many-core architecture, including data bypassing, score boarding and branch prediction. We show that these ILP techniques significantly reduce the performance dependency on TLP. This is especially useful for applications, whose resource usage limits the hardware to run a high number of threads concurrently. Furthermore, ILP techniques reduce the demand on on-chip resource to support high TLP. Given the workload-dependent impact from ILP, we propose heterogeneous GPGPU architecture, consisting of both the cores designed for high TLP and those customized with ILPtechniques. Our results show that our heterogeneous GPUarchitecture achieves high throughput as well as high energy and area-efficiency compared to homogenous designs.