Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware最新文献

筛选
英文 中文
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages Chapel-on-X:探索PGAS语言的任务处理运行时
Akihiro Hayashi, S. Paul, M. Grossman, J. Shirako, Vivek Sarkar
{"title":"Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages","authors":"Akihiro Hayashi, S. Paul, M. Grossman, J. Shirako, Vivek Sarkar","doi":"10.1145/3152041.3152086","DOIUrl":"https://doi.org/10.1145/3152041.3152086","url":null,"abstract":"With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed. While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the existing Qthreads backend of Chapel.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125655243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Extending the Open Community Runtime with External Application Support 通过外部应用程序支持扩展开放社区运行时
J. Dokulil, S. Benkner
{"title":"Extending the Open Community Runtime with External Application Support","authors":"J. Dokulil, S. Benkner","doi":"10.1145/3152041.3152088","DOIUrl":"https://doi.org/10.1145/3152041.3152088","url":null,"abstract":"The Open Community Runtime specification prescribes the way a task-parallel application has to be written, in order to give the runtime system the ability to automatically migrate work and data, provide fault tolerance, improve portability, etc. These constraints prevent an application from efficiently starting a new process to run another external program. We have designed an extension of the specification which provides exactly this functionality in a way that fits the task-based model. The bulk of our work is devoted to exploring the way the task-parallel application can interact with an external application without having to resort to using files on a physical drive for data exchange. To eliminate the need to make changes to the external application, the data is exposed via a virtual file system using the filesystem-in-userspace architecture.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"344 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134427706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs 在gpu上处理异构异步运行时系统中的全局数据依赖关系
B. Peterson, A. Humphrey, John A. Schmidt, M. Berzins
{"title":"Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs","authors":"B. Peterson, A. Humphrey, John A. Schmidt, M. Berzins","doi":"10.1145/3152041.3152082","DOIUrl":"https://doi.org/10.1145/3152041.3152082","url":null,"abstract":"Large-scale parallel applications with complex global data dependencies beyond those of reductions pose significant scalability challenges in an asynchronous runtime system. Internodal challenges include identifying the all-to-all communication of data dependencies among the nodes. Intranodal challenges include gathering together these data dependencies into usable data objects while avoiding data duplication. This paper addresses these challenges within the context of a large-scale, industrial coal boiler simulation using the Uintah asynchronous many-task runtime system on GPU architectures. We show significant reduction in time spent analyzing data dependencies through refinements in our dependency search algorithm. Multiple task graphs are used to eliminate subsequent analysis when task graphs change in predictable and repeatable ways. Using a combined data store and task scheduler redesign reduces data dependency duplication ensuring that problems fit within host and GPU memory. These modifications did not require any changes to application code or sweeping changes to the Uintah runtime system. We report results running on the DOE Titan system on 119K CPU cores and 7.5K GPUs simultaneously. Our solutions can be generalized to other task dependency problems with global dependencies among thousands of nodes which must be processed efficiently at large scale.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121395189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Verification of the Extended Roofline Model for Asynchronous Many Task Runtimes 异步多任务运行时扩展rooline模型的验证
Joshua D. Suetterlein, Joshua Landwehr, A. Márquez, J. Manzano, K. Barker, G. Gao
{"title":"Verification of the Extended Roofline Model for Asynchronous Many Task Runtimes","authors":"Joshua D. Suetterlein, Joshua Landwehr, A. Márquez, J. Manzano, K. Barker, G. Gao","doi":"10.1145/3152041.3152087","DOIUrl":"https://doi.org/10.1145/3152041.3152087","url":null,"abstract":"Asynchronous Many Task (AMT) runtimes promise application designers the ability to better utilize novel hardware resources and to take advantages of the idle times that might arise from the discrepancies due to mismatches between software and hardware components. To foresee possible problems between hardware and software components (described as mismatches), designers usually use models to predict and analyze application behaviors. However, current models are ill suited for the AMT crowd because of its dynamic behavior and agility. To this effect, we developed an extended roofline model that aims to provide upper bounds on execution for AMT frameworks. This work focuses on the validation and error characterization of this model using different statistical techniques and a large set of experiments to evaluate and characterize its error and its sources. We found out that in the worst case, the error can grow to an order of magnitude, however there are several techniques to increase the model accuracy given a machine configuration.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122200793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Risk-based Selective Redundancy for Fault-tolerant Task-parallel HPC Applications 基于风险的自动选择冗余容错任务并行HPC应用
Omer Subasi, O. Unsal, S. Krishnamoorthy
{"title":"Automatic Risk-based Selective Redundancy for Fault-tolerant Task-parallel HPC Applications","authors":"Omer Subasi, O. Unsal, S. Krishnamoorthy","doi":"10.1145/3152041.3152083","DOIUrl":"https://doi.org/10.1145/3152041.3152083","url":null,"abstract":"Silent data corruption (SDC) and fail-stop errors are the most hazardous error types in high-performance computing (HPC) systems. In this study, we present an automatic, efficient and lightweight redundancy mechanism to mitigate both error types. We propose partial task-replication and checkpointing for task-parallel HPC applications to mitigate silent and fail-stop errors. To avoid the prohibitive costs of complete replication, we introduce a lightweight selective replication mechanism. Using a fully automatic and transparent heuristics, we identify and selectively replicate only the reliability-critical tasks based on a risk metric. Our approach detects and corrects around 70% of silent errors with only 5% average performance overhead. Additionally, the performance overhead of the heuristic itself is negligible.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125849717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating OpenMP into the Charm++ Programming Model 将OpenMP集成到Charm++编程模型中
Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé
{"title":"Integrating OpenMP into the Charm++ Programming Model","authors":"Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé","doi":"10.1145/3152041.3152085","DOIUrl":"https://doi.org/10.1145/3152041.3152085","url":null,"abstract":"The recent trend of rapid increase in the number of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex themselves, resulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and decrease in system utilization. In this paper, we propose a new integrated runtime system that adds OpenMP shared-memory parallelism to the Charm++ distributed programming model to improve load balancing on distributed systems. Our proposal utilizes an infrequent periodic assignment of work to cores based on load measurement, in combination with tasks created via OpenMP's parallel loop construct from each core to handle load imbalance. We demonstrate the benefits of using this integrated runtime system on the LLNL ASC proxy application Lassen, achieving speedups of 50% over runs without any load balancing and 10% over existing distributed-memory-only balancing schemes in Charm++.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129041203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
HPX Smart Executors HPX智能执行者
Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio
{"title":"HPX Smart Executors","authors":"Zahra Khatami, Lukas Troska, Hartmut Kaiser, J. Ramanujam, Adrian Serio","doi":"10.1145/3152041.3152084","DOIUrl":"https://doi.org/10.1145/3152041.3152084","url":null,"abstract":"The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12% -- 35% for the Matrix Multiplication, Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114976450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware 第三届极端规模规划模型与中间件国际研讨会论文集
D. Panda, K. Schulz, Khaled Hamidouche, H. Subramoni
{"title":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","authors":"D. Panda, K. Schulz, Khaled Hamidouche, H. Subramoni","doi":"10.1145/3152041","DOIUrl":"https://doi.org/10.1145/3152041","url":null,"abstract":"Welcome to ESPM2 '15 workshop! As the HPC field is heading to Exascale, the role of Programming Models and Middleware is getting more important. The objectives of this workshop are to bring together researchers working in this area and discuss the stateof- the-art developments in the field. \u0000 \u0000The detailed workshop program is indicated in the previous page. We would like to thank all authors who submitted papers to this workshop. Special thanks go to the program committee members for providing us with high-quality reviews under tight deadlines. For each submitted paper, we were able to collect at least four reviews. We were able to receive 100% reviews on a tight deadline. Based on the reviews and online discussion among the PC members, a set of five regular papers and two short papers were selected. These papers reflect the state-of-the-art research and developments being conducted in the community in the emerging programming models and middleware area.","PeriodicalId":102432,"journal":{"name":"Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware","volume":"02 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127331056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信