GPGPU性能扩展的分类

2015 IEEE International Symposium on Workload Characterization Pub Date : 2015-10-04 DOI:10.1109/IISWC.2015.22

Abhinandan Majumdar, Gene Y. Wu, K. Dev, J. Greathouse, Indrani Paul, Wei Huang, Arjun Venugopal, Leonardo Piga, Chip Freitag, Sooraj Puthoor

{"title":"GPGPU性能扩展的分类","authors":"Abhinandan Majumdar, Gene Y. Wu, K. Dev, J. Greathouse, Indrani Paul, Wei Huang, Arjun Venugopal, Leonardo Piga, Chip Freitag, Sooraj Puthoor","doi":"10.1109/IISWC.2015.22","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) range from small, embedded designs to large, high-powered discrete cards. While the performance of graphics workloads is generally understood, there has been little study of the performance of GPGPU applications across a variety of hardware configurations. This work presents performance scaling data gathered for 267 GPGPU kernels from 97 programs run on 891 hardware configurations of a modern GPU. We study the performance of these kernels across a 5× change in core frequency, 8.3× change in memory bandwidth, and 11× difference in compute units. We illustrate that many kernels scale in intuitive ways, such as those that scale directly with added computational capabilities or memory bandwidth. We also find a number of kernels that scale in non-obvious ways, such as losing performance when more processing units are added or plateauing as frequency and bandwidth are increased. In addition, we show that a number of current benchmark suites do not scale to modern GPU sizes, implying that either new benchmarks or new inputs are warranted.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"A Taxonomy of GPGPU Performance Scaling\",\"authors\":\"Abhinandan Majumdar, Gene Y. Wu, K. Dev, J. Greathouse, Indrani Paul, Wei Huang, Arjun Venugopal, Leonardo Piga, Chip Freitag, Sooraj Puthoor\",\"doi\":\"10.1109/IISWC.2015.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics processing units (GPUs) range from small, embedded designs to large, high-powered discrete cards. While the performance of graphics workloads is generally understood, there has been little study of the performance of GPGPU applications across a variety of hardware configurations. This work presents performance scaling data gathered for 267 GPGPU kernels from 97 programs run on 891 hardware configurations of a modern GPU. We study the performance of these kernels across a 5× change in core frequency, 8.3× change in memory bandwidth, and 11× difference in compute units. We illustrate that many kernels scale in intuitive ways, such as those that scale directly with added computational capabilities or memory bandwidth. We also find a number of kernels that scale in non-obvious ways, such as losing performance when more processing units are added or plateauing as frequency and bandwidth are increased. In addition, we show that a number of current benchmark suites do not scale to modern GPU sizes, implying that either new benchmarks or new inputs are warranted.\",\"PeriodicalId\":142698,\"journal\":{\"name\":\"2015 IEEE International Symposium on Workload Characterization\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Symposium on Workload Characterization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2015.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2015.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

图形处理单元(gpu)的范围从小型的嵌入式设计到大型的高性能分立卡。虽然图形工作负载的性能通常被理解，但对GPGPU应用程序在各种硬件配置下的性能的研究很少。这项工作展示了在现代GPU的891个硬件配置上运行的97个程序中收集的267个GPGPU内核的性能缩放数据。我们研究了这些内核在核心频率变化5倍、内存带宽变化8.3倍、计算单元变化11倍的情况下的性能。我们说明了许多内核以直观的方式扩展，例如那些直接通过添加计算能力或内存带宽进行扩展的内核。我们还发现许多内核以不明显的方式扩展，例如当添加更多处理单元时性能会下降，或者随着频率和带宽的增加而趋于稳定。此外，我们表明，许多当前的基准套件不能扩展到现代GPU尺寸，这意味着需要新的基准或新的输入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Taxonomy of GPGPU Performance Scaling

Graphics processing units (GPUs) range from small, embedded designs to large, high-powered discrete cards. While the performance of graphics workloads is generally understood, there has been little study of the performance of GPGPU applications across a variety of hardware configurations. This work presents performance scaling data gathered for 267 GPGPU kernels from 97 programs run on 891 hardware configurations of a modern GPU. We study the performance of these kernels across a 5× change in core frequency, 8.3× change in memory bandwidth, and 11× difference in compute units. We illustrate that many kernels scale in intuitive ways, such as those that scale directly with added computational capabilities or memory bandwidth. We also find a number of kernels that scale in non-obvious ways, such as losing performance when more processing units are added or plateauing as frequency and bandwidth are increased. In addition, we show that a number of current benchmark suites do not scale to modern GPU sizes, implying that either new benchmarks or new inputs are warranted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Symposium on Workload Characterization

自引率

0.00%

发文量