异构架构下DFT计算的性能建模与调优

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00108

H. Ahmed, David B. Williams-Young, K. Ibrahim, Chao Yang

{"title":"异构架构下DFT计算的性能建模与调优","authors":"H. Ahmed, David B. Williams-Young, K. Ibrahim, Chao Yang","doi":"10.1109/IPDPSW52791.2021.00108","DOIUrl":null,"url":null,"abstract":"Tuning scientific code for heterogeneous computing architecture is a growing challenge. Not only do we need to tune the code to multiple architectures, but also we need to select or schedule computations to the most efficient compute variant. In this paper, we explore the tuning and performance modeling question of one of the most time computing kernels in density functional theory calculations on systems with a multicore host CPU accelerated with GPUs. We show the problem configuration dictates the choice of the most efficient compute engine. Such choice could alternate between the host and the accelerator, especially while scaling. As such, a performance model to predict the execution time on the host CPU and GPU is essential to select the compute environment and to achieve optimal performance. We present a simple model that empirically carry out such tasks and could accurately steer the scheduling of computation.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance Modeling and Tuning for DFT Calculations on Heterogeneous Architectures\",\"authors\":\"H. Ahmed, David B. Williams-Young, K. Ibrahim, Chao Yang\",\"doi\":\"10.1109/IPDPSW52791.2021.00108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tuning scientific code for heterogeneous computing architecture is a growing challenge. Not only do we need to tune the code to multiple architectures, but also we need to select or schedule computations to the most efficient compute variant. In this paper, we explore the tuning and performance modeling question of one of the most time computing kernels in density functional theory calculations on systems with a multicore host CPU accelerated with GPUs. We show the problem configuration dictates the choice of the most efficient compute engine. Such choice could alternate between the host and the accelerator, especially while scaling. As such, a performance model to predict the execution time on the host CPU and GPU is essential to select the compute environment and to achieve optimal performance. We present a simple model that empirically carry out such tasks and could accurately steer the scheduling of computation.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为异构计算架构调优科学代码是一个日益严峻的挑战。我们不仅需要将代码调优到多个体系结构，还需要选择或调度计算到最有效的计算变体。在本文中，我们探讨了密度泛函理论计算中最耗时的计算内核之一在多核主机CPU和gpu加速系统上的调优和性能建模问题。我们展示了问题配置决定了最有效的计算引擎的选择。这种选择可以在主机和加速器之间交替进行，特别是在扩展时。因此，预测主机CPU和GPU上的执行时间的性能模型对于选择计算环境和实现最佳性能至关重要。我们提出了一个简单的模型，可以经验地执行这些任务，并能准确地指导计算的调度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance Modeling and Tuning for DFT Calculations on Heterogeneous Architectures

Tuning scientific code for heterogeneous computing architecture is a growing challenge. Not only do we need to tune the code to multiple architectures, but also we need to select or schedule computations to the most efficient compute variant. In this paper, we explore the tuning and performance modeling question of one of the most time computing kernels in density functional theory calculations on systems with a multicore host CPU accelerated with GPUs. We show the problem configuration dictates the choice of the most efficient compute engine. Such choice could alternate between the host and the accelerator, especially while scaling. As such, a performance model to predict the execution time on the host CPU and GPU is essential to select the compute environment and to achieve optimal performance. We present a simple model that empirically carry out such tasks and could accurately steer the scheduling of computation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量