HPX智能执行者

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-05 DOI:10.1145/3152041.3152084

Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio

{"title":"HPX智能执行者","authors":"Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio","doi":"10.1145/3152041.3152084","DOIUrl":null,"url":null,"abstract":"The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"HPX Smart Executors\",\"authors\":\"Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio\",\"doi\":\"10.1145/3152041.3152084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.\",\"PeriodicalId\":102432,\"journal\":{\"name\":\"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3152041.3152084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3152041.3152084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

许多并行应用程序的性能取决于循环级别的并行性。但是，手动并行化所有循环可能会降低并行性能，因为其中一些循环无法理想地扩展到大量线程。此外，手动调优循环参数的开销可能会阻止应用程序达到其最大并行性能。我们将说明如何应用机器学习技术来解决这些挑战。在本研究中，我们开发了一个能够自动捕获循环的静态和动态信息的框架。此外，我们提出了一种新颖的方法，通过引入HPX智能执行器来确定HPX循环的执行策略、块大小和预取距离，通过将编译过程中捕获的静态信息和基于运行时的动态信息馈送到我们的学习模型来实现更高的性能。我们评估的执行结果表明，与手动设置HPX循环的执行策略/参数或使用HPX自动并行化技术相比，使用这些智能执行器可以将矩阵乘法、流和2D Stencil基准测试的HPX执行过程加快约12% - 35%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HPX Smart Executors

The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware

自引率

0.00%

发文量