HPX智能执行者

Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio
{"title":"HPX智能执行者","authors":"Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio","doi":"10.1145/3152041.3152084","DOIUrl":null,"url":null,"abstract":"The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"HPX Smart Executors\",\"authors\":\"Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio\",\"doi\":\"10.1145/3152041.3152084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.\",\"PeriodicalId\":102432,\"journal\":{\"name\":\"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3152041.3152084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3152041.3152084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

许多并行应用程序的性能取决于循环级别的并行性。但是,手动并行化所有循环可能会降低并行性能,因为其中一些循环无法理想地扩展到大量线程。此外,手动调优循环参数的开销可能会阻止应用程序达到其最大并行性能。我们将说明如何应用机器学习技术来解决这些挑战。在本研究中,我们开发了一个能够自动捕获循环的静态和动态信息的框架。此外,我们提出了一种新颖的方法,通过引入HPX智能执行器来确定HPX循环的执行策略、块大小和预取距离,通过将编译过程中捕获的静态信息和基于运行时的动态信息馈送到我们的学习模型来实现更高的性能。我们评估的执行结果表明,与手动设置HPX循环的执行策略/参数或使用HPX自动并行化技术相比,使用这些智能执行器可以将矩阵乘法、流和2D Stencil基准测试的HPX执行过程加快约12% - 35%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HPX Smart Executors
The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信