{"title":"使用经验性能饱和尺寸理解gpu的强缩放","authors":"David Eberius, P. Roth, D. Rogers","doi":"10.1109/P3HPC56579.2022.00008","DOIUrl":null,"url":null,"abstract":"The roofline model provides a concise overview of the maximum performance capabilities of a given computer system through a combination of peak memory bandwidth and compute performance rates. The increasing complexity of scheduling and cache in recent GPUs, however, has introduced complicated performance variability that is not captured by arithmetic intensity alone. This work examines the effect of problem size and GPU launch configurations on roofline performance for V100, A100, MI100, and MI250X graphics processing units. We introduce an extended roofline model that takes problem size into account, and find that strong scaling on GPUs can be characterized by saturation problem sizes as additional key metrics. Saturation problem sizes break up a plot of GPU performance vs. problem size into three distinct performance regimes– size-limited, cache-bound, and DRAM-bound. With our extended roofline model, we are able to provide a robust view of these performance regimes across recent GPU architectures.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Understanding Strong Scaling on GPUs Using Empirical Performance Saturation Size\",\"authors\":\"David Eberius, P. Roth, D. Rogers\",\"doi\":\"10.1109/P3HPC56579.2022.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The roofline model provides a concise overview of the maximum performance capabilities of a given computer system through a combination of peak memory bandwidth and compute performance rates. The increasing complexity of scheduling and cache in recent GPUs, however, has introduced complicated performance variability that is not captured by arithmetic intensity alone. This work examines the effect of problem size and GPU launch configurations on roofline performance for V100, A100, MI100, and MI250X graphics processing units. We introduce an extended roofline model that takes problem size into account, and find that strong scaling on GPUs can be characterized by saturation problem sizes as additional key metrics. Saturation problem sizes break up a plot of GPU performance vs. problem size into three distinct performance regimes– size-limited, cache-bound, and DRAM-bound. With our extended roofline model, we are able to provide a robust view of these performance regimes across recent GPU architectures.\",\"PeriodicalId\":261766,\"journal\":{\"name\":\"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/P3HPC56579.2022.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Understanding Strong Scaling on GPUs Using Empirical Performance Saturation Size
The roofline model provides a concise overview of the maximum performance capabilities of a given computer system through a combination of peak memory bandwidth and compute performance rates. The increasing complexity of scheduling and cache in recent GPUs, however, has introduced complicated performance variability that is not captured by arithmetic intensity alone. This work examines the effect of problem size and GPU launch configurations on roofline performance for V100, A100, MI100, and MI250X graphics processing units. We introduce an extended roofline model that takes problem size into account, and find that strong scaling on GPUs can be characterized by saturation problem sizes as additional key metrics. Saturation problem sizes break up a plot of GPU performance vs. problem size into three distinct performance regimes– size-limited, cache-bound, and DRAM-bound. With our extended roofline model, we are able to provide a robust view of these performance regimes across recent GPU architectures.