{"title":"动态经纱调整:高性能SIMT的分析与效益","authors":"Ahmad Lashgar, A. Baniasadi, A. Khonsari","doi":"10.1109/ICCD.2012.6378694","DOIUrl":null,"url":null,"abstract":"Modern GPUs synchronize threads grouped in warps. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead, and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with branch and memory divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch and memory divergence. Dynamic workload behavior, including branch/memory divergence and coalescing, is an important factor in determining the warp size returning best performance. Based on this observation, we propose Dynamic Warp Resizing (DWR). DWR outperforms static warp size decisions, up to 2.28X.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Dynamic warp resizing: Analysis and benefits in high-performance SIMT\",\"authors\":\"Ahmad Lashgar, A. Baniasadi, A. Khonsari\",\"doi\":\"10.1109/ICCD.2012.6378694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern GPUs synchronize threads grouped in warps. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead, and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with branch and memory divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch and memory divergence. Dynamic workload behavior, including branch/memory divergence and coalescing, is an important factor in determining the warp size returning best performance. Based on this observation, we propose Dynamic Warp Resizing (DWR). DWR outperforms static warp size decisions, up to 2.28X.\",\"PeriodicalId\":313428,\"journal\":{\"name\":\"2012 IEEE 30th International Conference on Computer Design (ICCD)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 30th International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2012.6378694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 30th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2012.6378694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic warp resizing: Analysis and benefits in high-performance SIMT
Modern GPUs synchronize threads grouped in warps. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead, and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with branch and memory divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch and memory divergence. Dynamic workload behavior, including branch/memory divergence and coalescing, is an important factor in determining the warp size returning best performance. Based on this observation, we propose Dynamic Warp Resizing (DWR). DWR outperforms static warp size decisions, up to 2.28X.