Massively Parallel Automated Software Tuning

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337908

J. Kurzak, Y. Tsai, M. Gates, A. Abdelfattah, J. Dongarra

引用次数: 5

Abstract

This article presents an implementation of a distributed autotuning engine developed as part of the Bench-testing OpenN Software Autotuning Infrastructure project. The system is geared towards performance optimization of computational kernels for graphics processing units, and allows for the deployment of vast autotuning sweeps to massively parallel machines. The software implements dynamic work scheduling to distributed-memory resources and takes advantage of multithreading for parallel compilation and dispatches kernel launches to multiple accelerators. This paper lays out the main design principles of the system and discusses the basic mechanics of the initial implementation. Preliminary performance results are presented, encountered challenges are discussed, and the future directions are outlined.

查看原文本刊更多论文

大规模并行自动化软件调优

本文介绍了分布式自动调优引擎的实现，该实现是OpenN软件自动调优基础设施项目的一部分。该系统面向图形处理单元的计算内核的性能优化，并允许在大规模并行机器上部署大量自动调整扫描。该软件实现了对分布式内存资源的动态工作调度，利用多线程进行并行编译，并将内核启动分配给多个加速器。本文阐述了该系统的主要设计原则，并讨论了初步实现的基本机制。介绍了初步的性能结果，讨论了遇到的挑战，并概述了未来的发展方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量