{"title":"基于跨核资源共享和核心重构的高效GPGPU计算","authors":"Ashutosh Dhar, Deming Chen","doi":"10.1109/FCCM.2017.59","DOIUrl":null,"url":null,"abstract":"GPUs are capable of running a variety of applications, however their generic parallel-architecture can lead to inefficient use of resources and reduced power efficiency, due to algorithmic or architectural constraints. In this work, taking inspiration from CGRAs (coarse-grained reconfigurable architectures), we demonstrate resource sharing and re-distribution as a solution that can be leveraged by reconfiguring the GPU on a kernel-by-kernel basis. We explore four different schemes that trade the number of active SMs (streaming multiprocessor) for increased occupancy and local memory resources per SM and demonstrate improved power and energy with limited impact to performance. Our most aggressive scheme, BigSM, is capable of saving energy by up to 54%, and 26% on an average.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"357 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Efficient GPGPU Computing with Cross-Core Resource Sharing and Core Reconfiguration\",\"authors\":\"Ashutosh Dhar, Deming Chen\",\"doi\":\"10.1109/FCCM.2017.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPUs are capable of running a variety of applications, however their generic parallel-architecture can lead to inefficient use of resources and reduced power efficiency, due to algorithmic or architectural constraints. In this work, taking inspiration from CGRAs (coarse-grained reconfigurable architectures), we demonstrate resource sharing and re-distribution as a solution that can be leveraged by reconfiguring the GPU on a kernel-by-kernel basis. We explore four different schemes that trade the number of active SMs (streaming multiprocessor) for increased occupancy and local memory resources per SM and demonstrate improved power and energy with limited impact to performance. Our most aggressive scheme, BigSM, is capable of saving energy by up to 54%, and 26% on an average.\",\"PeriodicalId\":124631,\"journal\":{\"name\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"357 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2017.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient GPGPU Computing with Cross-Core Resource Sharing and Core Reconfiguration
GPUs are capable of running a variety of applications, however their generic parallel-architecture can lead to inefficient use of resources and reduced power efficiency, due to algorithmic or architectural constraints. In this work, taking inspiration from CGRAs (coarse-grained reconfigurable architectures), we demonstrate resource sharing and re-distribution as a solution that can be leveraged by reconfiguring the GPU on a kernel-by-kernel basis. We explore four different schemes that trade the number of active SMs (streaming multiprocessor) for increased occupancy and local memory resources per SM and demonstrate improved power and energy with limited impact to performance. Our most aggressive scheme, BigSM, is capable of saving energy by up to 54%, and 26% on an average.