Yu-Hao Huang, Ying-Yu Tseng, Hsien-Kai Kuo, Ta-Kan Yen, B. Lai
{"title":"A Locality-Aware Dynamic Thread Scheduler for GPGPUs","authors":"Yu-Hao Huang, Ying-Yu Tseng, Hsien-Kai Kuo, Ta-Kan Yen, B. Lai","doi":"10.1109/PDCAT.2013.46","DOIUrl":null,"url":null,"abstract":"Modern GPGPUs implement on-chip shared cache to better exploit the data reuse of various general purpose applications. Given the massive amount of concurrent threads in a GPGPU, striking the balance between Data Locality and Load Balance has become a critical design concern. To achieve the best performance, the trade-off between these two factors needs to be performed concurrently. This paper proposes a dynamic thread scheduler which co-optimizes both the data locality and load balance on a GPGPU. The proposed approach is evaluated using three applications with various input datasets. The results show that the proposed approach reduces the overall execution cycles by up to 16% when compared with other approaches concerning only one objective.","PeriodicalId":187974,"journal":{"name":"2013 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2013.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Modern GPGPUs implement on-chip shared cache to better exploit the data reuse of various general purpose applications. Given the massive amount of concurrent threads in a GPGPU, striking the balance between Data Locality and Load Balance has become a critical design concern. To achieve the best performance, the trade-off between these two factors needs to be performed concurrently. This paper proposes a dynamic thread scheduler which co-optimizes both the data locality and load balance on a GPGPU. The proposed approach is evaluated using three applications with various input datasets. The results show that the proposed approach reduces the overall execution cycles by up to 16% when compared with other approaches concerning only one objective.