Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware最新文献

Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages Chapel-on-X:探索PGAS语言的任务处理运行时

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-12 DOI: 10.1145/3152041.3152086

Akihiro Hayashi, S. Paul, M. Grossman, J. Shirako, Vivek Sarkar

{"title":"Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages","authors":"Akihiro Hayashi, S. Paul, M. Grossman, J. Shirako, Vivek Sarkar","doi":"10.1145/3152041.3152086","DOIUrl":"https://doi.org/10.1145/3152041.3152086","url":null,"abstract":"With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed. While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the existing Qthreads backend of Chapel.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125655243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Extending the Open Community Runtime with External Application Support 通过外部应用程序支持扩展开放社区运行时

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-12 DOI: 10.1145/3152041.3152088

J. Dokulil, S. Benkner

引用次数: 0

Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs 在gpu上处理异构异步运行时系统中的全局数据依赖关系

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-12 DOI: 10.1145/3152041.3152082

B. Peterson, A. Humphrey, John A. Schmidt, M. Berzins

{"title":"Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs","authors":"B. Peterson, A. Humphrey, John A. Schmidt, M. Berzins","doi":"10.1145/3152041.3152082","DOIUrl":"https://doi.org/10.1145/3152041.3152082","url":null,"abstract":"Large-scale parallel applications with complex global data dependencies beyond those of reductions pose significant scalability challenges in an asynchronous runtime system. Internodal challenges include identifying the all-to-all communication of data dependencies among the nodes. Intranodal challenges include gathering together these data dependencies into usable data objects while avoiding data duplication. This paper addresses these challenges within the context of a large-scale, industrial coal boiler simulation using the Uintah asynchronous many-task runtime system on GPU architectures. We show significant reduction in time spent analyzing data dependencies through refinements in our dependency search algorithm. Multiple task graphs are used to eliminate subsequent analysis when task graphs change in predictable and repeatable ways. Using a combined data store and task scheduler redesign reduces data dependency duplication ensuring that problems fit within host and GPU memory. These modifications did not require any changes to application code or sweeping changes to the Uintah runtime system. We report results running on the DOE Titan system on 119K CPU cores and 7.5K GPUs simultaneously. Our solutions can be generalized to other task dependency problems with global dependencies among thousands of nodes which must be processed efficiently at large scale.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121395189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Verification of the Extended Roofline Model for Asynchronous Many Task Runtimes 异步多任务运行时扩展rooline模型的验证

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-12 DOI: 10.1145/3152041.3152087

Joshua D. Suetterlein, Joshua Landwehr, A. Márquez, J. Manzano, K. Barker, G. Gao

引用次数: 0

Automatic Risk-based Selective Redundancy for Fault-tolerant Task-parallel HPC Applications 基于风险的自动选择冗余容错任务并行HPC应用

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-12 DOI: 10.1145/3152041.3152083

Omer Subasi, O. Unsal, S. Krishnamoorthy

引用次数: 2

Integrating OpenMP into the Charm++ Programming Model 将OpenMP集成到Charm++编程模型中

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-12 DOI: 10.1145/3152041.3152085

Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé

引用次数: 5

HPX Smart Executors HPX智能执行者

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2017-11-05 DOI: 10.1145/3152041.3152084

Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio

{"title":"HPX Smart Executors","authors":"Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio","doi":"10.1145/3152041.3152084","DOIUrl":"https://doi.org/10.1145/3152041.3152084","url":null,"abstract":"The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114976450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware 第三届极端规模规划模型与中间件国际研讨会论文集

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware Pub Date : 2015-11-15 DOI: 10.1145/3152041

D. Panda, K. Schulz, Khaled Hamidouche, H. Subramoni

引用次数: 1