候选HPC计算构建块上的算法时间、能量和功率

2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI:10.1109/IPDPS.2014.54

JeeWhan Choi, Marat Dukhan, Xing Liu, R. Vuduc

{"title":"候选HPC计算构建块上的算法时间、能量和功率","authors":"JeeWhan Choi, Marat Dukhan, Xing Liu, R. Vuduc","doi":"10.1109/IPDPS.2014.54","DOIUrl":null,"url":null,"abstract":"We conducted a micro benchmarking study of the time, energy, and power of computation and memory access on several existing platforms. These platforms represent candidate compute-node building blocks of future high-performance computing systems. Our analysis uses the \"energy roofline\" model, developed in prior work, which we extend in two ways. First, we improve the model's accuracy by accounting for power caps, basic memory hierarchy access costs, and measurement of random memory access patterns. Secondly, we empirically evaluate server-, mini-, and mobile-class platforms that span a range of compute and power characteristics. Our study includes a dozen such platforms, including x86 (both conventional and Xeon Phi), ARM, GPU, and hybrid (AMD APU and other SoC) processors. These data and our model analytically characterize the range of algorithmic regimes where we might prefer one building block to others. It suggests critical values of arithmetic intensity around which some systems may switch from being more to less time- and energy-efficient than others, it further suggests how, with respect to intensity, operations should be throttled to meet a power cap. We hope our methods can help make debates about the relative merits of these and other systems more quantitative, analytical, and insightful.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":"{\"title\":\"Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks\",\"authors\":\"JeeWhan Choi, Marat Dukhan, Xing Liu, R. Vuduc\",\"doi\":\"10.1109/IPDPS.2014.54\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We conducted a micro benchmarking study of the time, energy, and power of computation and memory access on several existing platforms. These platforms represent candidate compute-node building blocks of future high-performance computing systems. Our analysis uses the \\\"energy roofline\\\" model, developed in prior work, which we extend in two ways. First, we improve the model's accuracy by accounting for power caps, basic memory hierarchy access costs, and measurement of random memory access patterns. Secondly, we empirically evaluate server-, mini-, and mobile-class platforms that span a range of compute and power characteristics. Our study includes a dozen such platforms, including x86 (both conventional and Xeon Phi), ARM, GPU, and hybrid (AMD APU and other SoC) processors. These data and our model analytically characterize the range of algorithmic regimes where we might prefer one building block to others. It suggests critical values of arithmetic intensity around which some systems may switch from being more to less time- and energy-efficient than others, it further suggests how, with respect to intensity, operations should be throttled to meet a power cap. We hope our methods can help make debates about the relative merits of these and other systems more quantitative, analytical, and insightful.\",\"PeriodicalId\":309291,\"journal\":{\"name\":\"2014 IEEE 28th International Parallel and Distributed Processing Symposium\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"76\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 28th International Parallel and Distributed Processing Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2014.54\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2014.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 76

摘要

我们对几个现有平台上的计算和内存访问的时间、精力和能力进行了微观基准测试研究。这些平台代表了未来高性能计算系统的候选计算节点构建块。我们的分析使用了在先前工作中开发的“能量屋顶线”模型，我们将其扩展为两种方式。首先，我们通过考虑功率上限、基本内存层次访问成本和随机内存访问模式的测量来提高模型的准确性。其次，我们对服务器级、迷你级和移动级平台进行了实证评估，这些平台涵盖了一系列计算和功率特性。我们的研究包括十几个这样的平台，包括x86(传统和Xeon Phi)、ARM、GPU和混合(AMD APU和其他SoC)处理器。这些数据和我们的模型分析地描述了算法制度的范围，我们可能更喜欢一个构建块而不是其他。它提出了算术强度的临界值，围绕这个临界值，一些系统可能会从比其他系统更节省时间和能源，它进一步提出了如何，就强度而言，操作应该被限制以满足功率上限。我们希望我们的方法可以帮助使关于这些和其他系统的相对优点的辩论更具定量、分析性和洞察力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks

We conducted a micro benchmarking study of the time, energy, and power of computation and memory access on several existing platforms. These platforms represent candidate compute-node building blocks of future high-performance computing systems. Our analysis uses the "energy roofline" model, developed in prior work, which we extend in two ways. First, we improve the model's accuracy by accounting for power caps, basic memory hierarchy access costs, and measurement of random memory access patterns. Secondly, we empirically evaluate server-, mini-, and mobile-class platforms that span a range of compute and power characteristics. Our study includes a dozen such platforms, including x86 (both conventional and Xeon Phi), ARM, GPU, and hybrid (AMD APU and other SoC) processors. These data and our model analytically characterize the range of algorithmic regimes where we might prefer one building block to others. It suggests critical values of arithmetic intensity around which some systems may switch from being more to less time- and energy-efficient than others, it further suggests how, with respect to intensity, operations should be throttled to meet a power cap. We hope our methods can help make debates about the relative merits of these and other systems more quantitative, analytical, and insightful.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE 28th International Parallel and Distributed Processing Symposium

自引率

0.00%

发文量