Nilanjan Goswami, Ramkumar Shankar, Madhura Joshi, Tao Li
{"title":"探索GPGPU工作负载:表征方法、分析和微架构评估含义","authors":"Nilanjan Goswami, Ramkumar Shankar, Madhura Joshi, Tao Li","doi":"10.1109/IISWC.2010.5649549","DOIUrl":null,"url":null,"abstract":"The GPUs are emerging as a general-purpose high-performance computing device. Growing GPGPU research has made numerous GPGPU workloads available. However, a systematic approach to characterize these benchmarks and analyze their implication on GPU microarchitecture design evaluation is still lacking. In this research, we propose a set of microarchitecture agnostic GPGPU workload characteristics to represent them in a microarchitecture independent space. Correlated dimensionality reduction process and clustering analysis are used to understand these workloads. In addition, we propose a set of evaluation metrics to accurately evaluate the GPGPU design space. With growing number of GPGPU workloads, this approach of analysis provides meaningful, accurate and thorough simulation for a proposed GPU architecture design choice. Architects also benefit by choosing a set of workloads to stress their intended functional block of the GPU microarchitecture. We present a diversity analysis of GPU benchmark suites such as Nvidia CUDA SDK, Parboil and Rodinia. Our results show that with a large number of diverse kernels, workloads such as Similarity Score, Parallel Reduction, and Scan of Large Arrays show diverse characteristics in different workload spaces. We have also explored diversity in different workload subspaces (e.g. memory coalescing and branch divergence). Similarity Score, Scan of Large Arrays, MUMmerGPU, Hybrid Sort, and Nearest Neighbor workloads exhibit relatively large variation in branch divergence characteristics compared to others. Memory coalescing behavior is diverse in Scan of Large Arrays, K-Means, Similarity Score and Parallel Reduction.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"77","resultStr":"{\"title\":\"Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications\",\"authors\":\"Nilanjan Goswami, Ramkumar Shankar, Madhura Joshi, Tao Li\",\"doi\":\"10.1109/IISWC.2010.5649549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The GPUs are emerging as a general-purpose high-performance computing device. Growing GPGPU research has made numerous GPGPU workloads available. However, a systematic approach to characterize these benchmarks and analyze their implication on GPU microarchitecture design evaluation is still lacking. In this research, we propose a set of microarchitecture agnostic GPGPU workload characteristics to represent them in a microarchitecture independent space. Correlated dimensionality reduction process and clustering analysis are used to understand these workloads. In addition, we propose a set of evaluation metrics to accurately evaluate the GPGPU design space. With growing number of GPGPU workloads, this approach of analysis provides meaningful, accurate and thorough simulation for a proposed GPU architecture design choice. Architects also benefit by choosing a set of workloads to stress their intended functional block of the GPU microarchitecture. We present a diversity analysis of GPU benchmark suites such as Nvidia CUDA SDK, Parboil and Rodinia. Our results show that with a large number of diverse kernels, workloads such as Similarity Score, Parallel Reduction, and Scan of Large Arrays show diverse characteristics in different workload spaces. We have also explored diversity in different workload subspaces (e.g. memory coalescing and branch divergence). Similarity Score, Scan of Large Arrays, MUMmerGPU, Hybrid Sort, and Nearest Neighbor workloads exhibit relatively large variation in branch divergence characteristics compared to others. Memory coalescing behavior is diverse in Scan of Large Arrays, K-Means, Similarity Score and Parallel Reduction.\",\"PeriodicalId\":107589,\"journal\":{\"name\":\"IEEE International Symposium on Workload Characterization (IISWC'10)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"77\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on Workload Characterization (IISWC'10)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2010.5649549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Workload Characterization (IISWC'10)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2010.5649549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 77
摘要
gpu是一种新兴的通用高性能计算设备。不断增长的GPGPU研究使得许多GPGPU工作负载可用。然而,目前还缺乏一种系统的方法来描述这些基准测试并分析它们对GPU微架构设计评估的影响。在这项研究中,我们提出了一组与微架构无关的GPGPU工作负载特征,以在微架构独立的空间中表示它们。使用相关降维过程和聚类分析来理解这些工作负载。此外,我们还提出了一套评估指标来准确评估GPGPU的设计空间。随着GPGPU工作负载的不断增加,这种分析方法为提出的GPU架构设计选择提供了有意义、准确和全面的仿真。架构师还可以通过选择一组工作负载来强调其GPU微架构的预期功能块而受益。我们提出了GPU基准套件的多样性分析,如Nvidia CUDA SDK, Parboil和Rodinia。我们的研究结果表明,使用大量不同的内核,相似度评分、并行减少和大阵列扫描等工作负载在不同的工作负载空间中表现出不同的特征。我们还探讨了不同工作负载子空间的多样性(例如内存合并和分支发散)。与其他工作负载相比,相似性评分、大数组扫描、MUMmerGPU、混合排序和最近邻工作负载在分支发散特征上表现出相对较大的变化。在大阵列扫描、K-Means、相似性评分和并行约简中,内存合并行为是多种多样的。
Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications
The GPUs are emerging as a general-purpose high-performance computing device. Growing GPGPU research has made numerous GPGPU workloads available. However, a systematic approach to characterize these benchmarks and analyze their implication on GPU microarchitecture design evaluation is still lacking. In this research, we propose a set of microarchitecture agnostic GPGPU workload characteristics to represent them in a microarchitecture independent space. Correlated dimensionality reduction process and clustering analysis are used to understand these workloads. In addition, we propose a set of evaluation metrics to accurately evaluate the GPGPU design space. With growing number of GPGPU workloads, this approach of analysis provides meaningful, accurate and thorough simulation for a proposed GPU architecture design choice. Architects also benefit by choosing a set of workloads to stress their intended functional block of the GPU microarchitecture. We present a diversity analysis of GPU benchmark suites such as Nvidia CUDA SDK, Parboil and Rodinia. Our results show that with a large number of diverse kernels, workloads such as Similarity Score, Parallel Reduction, and Scan of Large Arrays show diverse characteristics in different workload spaces. We have also explored diversity in different workload subspaces (e.g. memory coalescing and branch divergence). Similarity Score, Scan of Large Arrays, MUMmerGPU, Hybrid Sort, and Nearest Neighbor workloads exhibit relatively large variation in branch divergence characteristics compared to others. Memory coalescing behavior is diverse in Scan of Large Arrays, K-Means, Similarity Score and Parallel Reduction.