A design methodology for domain-optimized power-efficient supercomputing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI:10.1145/1654059.1654072

M. Mohiyuddin, M. Murphy, L. Oliker, J. Shalf, J. Wawrzynek, Samuel Williams

{"title":"A design methodology for domain-optimized power-efficient supercomputing","authors":"M. Mohiyuddin, M. Murphy, L. Oliker, J. Shalf, J. Wawrzynek, Samuel Williams","doi":"10.1145/1654059.1654072","DOIUrl":null,"url":null,"abstract":"As power has become the pre-eminent design constraint for future HPC systems, computational efficiency is being emphasized over simply peak performance. Recently, static benchmark codes have been used to find a power efficient architecture. Unfortunately, because compilers generate sub-optimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software cotuning as a novel approach for system design, in which traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency. We demonstrate the proposed methodology by exploring the parameter space of a Tensilica-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Results demonstrate that co-tuning significantly improves hardware area and energy efficiency - a key driver for next generation of HPC system design.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"50 15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1654059.1654072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

As power has become the pre-eminent design constraint for future HPC systems, computational efficiency is being emphasized over simply peak performance. Recently, static benchmark codes have been used to find a power efficient architecture. Unfortunately, because compilers generate sub-optimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software cotuning as a novel approach for system design, in which traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency. We demonstrate the proposed methodology by exploring the parameter space of a Tensilica-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Results demonstrate that co-tuning significantly improves hardware area and energy efficiency - a key driver for next generation of HPC system design.

查看原文本刊更多论文

一种领域优化的节能超级计算设计方法

由于功率已成为未来高性能计算系统的主要设计约束，计算效率正在被强调，而不仅仅是峰值性能。最近，静态基准代码被用来寻找一种节能的架构。不幸的是，由于编译器生成的代码不是最优的，所以基准性能不能很好地指示体系结构设计点的性能潜力。因此，我们提出硬件/软件调整作为系统设计的一种新方法，其中传统的建筑空间探索与软件自动调整紧密结合，以提供面积和功率效率的实质性改进。我们通过探索基于tensilica的多处理器的参数空间来演示所提出的方法，该多处理器运行科学计算中最常用的三种内核，每种内核都具有广泛不同的微架构要求:稀疏矩阵向量乘法、基于模板的计算和一般矩阵-矩阵乘法。结果表明，共调优显著提高了硬件面积和能源效率，这是下一代高性能计算系统设计的关键驱动因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

自引率

0.00%

发文量