StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00090

Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, D. Qian

{"title":"StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs","authors":"Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Zhongzhi Luan, D. Qian","doi":"10.1109/ipdps53621.2022.00090","DOIUrl":null,"url":null,"abstract":"Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed such as streaming and retiming. However, due to the diversity of stencil patterns and GPU architectures, no single optimization technique fits all stencils. Besides, it is challenging to choose the most cost-efficient GPU for accelerating target stencils. To address the above problems, we propose StencilMART, an automatic optimization selection framework that predicts the best optimization combination and execution time under a certain parameter setting for stencils on GPUs. Specifically, the StencilMART represents the stencil patterns as binary tensors and neighboring features through tensor assignment and feature extraction. In addition, the StencilMART implements various machine learning methods such as classification and regression that utilize stencil representation and hardware characteristics for execution time prediction. The experiment results show that the StencilMART can achieve accurate optimization selection and performance prediction for various stencils across GPUs.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Stencil computations are widely used in high performance computing (HPC) applications. Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil computations. In recent years, stencils have become more diverse in terms of stencil order, memory accesses and computation patterns. To adapt diverse stencils to GPUs, a variety of optimization techniques have been proposed such as streaming and retiming. However, due to the diversity of stencil patterns and GPU architectures, no single optimization technique fits all stencils. Besides, it is challenging to choose the most cost-efficient GPU for accelerating target stencils. To address the above problems, we propose StencilMART, an automatic optimization selection framework that predicts the best optimization combination and execution time under a certain parameter setting for stencils on GPUs. Specifically, the StencilMART represents the stencil patterns as binary tensors and neighboring features through tensor assignment and feature extraction. In addition, the StencilMART implements various machine learning methods such as classification and regression that utilize stencil representation and hardware characteristics for execution time prediction. The experiment results show that the StencilMART can achieve accurate optimization selection and performance prediction for various stencils across GPUs.

查看原文本刊更多论文

StencilMART:预测跨gpu的模板计算的优化选择

模板计算在高性能计算(HPC)中有着广泛的应用。许多高性能计算平台利用gpu的高计算能力来加速模板计算。近年来，模板在模板顺序、内存访问和计算模式方面变得更加多样化。为了使不同的模板适应gpu，提出了各种优化技术，如流和重定时。然而，由于模板模式和GPU架构的多样性，没有一种优化技术适合所有的模板。此外，如何选择最具成本效益的GPU来加速目标模板也是一个挑战。为了解决上述问题，我们提出了StencilMART，这是一个自动优化选择框架，可以预测gpu上模板在一定参数设置下的最佳优化组合和执行时间。具体来说，StencilMART通过张量分配和特征提取将模板模式表示为二值张量和相邻特征。此外，StencilMART实现了各种机器学习方法，如分类和回归，利用模板表示和硬件特征进行执行时间预测。实验结果表明，StencilMART可以实现跨gpu的各种模板的准确优化选择和性能预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量