{"title":"Challenges and Opportunities for Dataflow Processing on Exascale Computers","authors":"J. Wozniak, M. Wilde, Ian T Foster","doi":"10.1145/3292533.3292537","DOIUrl":"https://doi.org/10.1145/3292533.3292537","url":null,"abstract":"Computational applications critical to society in areas such as materials design, climate modeling, and energy production and distribution must be developed quickly and correctly. Such studies are typically done via computational experiments, in which large numbers of simulation and analysis tasks are strung together into a workflow, formally or informally. Many innovative programming models to handle workflow specifications are based on an implicitly parallel dataflow language to support this model. Here, we posit that the dataflow model has the added benefit that it can exploit features of exascale computers, expected in ∼2023 [21]. Workflows and other outermost patterns such as parameter sweeps, searches, and optimizations can easily be expressed with dataflow languages. In this model, either statically or at runtime a dataflow structure is available to the runtime system, presenting the opportunity for many types of automated decisions for scheduling and resource management. Workflow applications on these systems will also introduce new requirements, such as high-performance data movement methods for in situ data analysis, and new challenges, such as varying reliability characterstics. In this paper, we describe three key exascale feature areas that will be exposed to users at the application level. First, a more complex storage hierarchy is expected. New cache types such as scratchpad memory may be available, and heterogeneous RAM systems may have differing performance and reliability characteristics. Node-local storage may be available. These systems are expected to be available to the application or middleware via advanced operating system and runtime features. Second, tighter power budgets and programmer-controlled power scaling will likely be available. These will require the application level to make decisions about performance/power tradeoffs. Third, a more complex","PeriodicalId":195082,"journal":{"name":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114718340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CnC: A Dependence Programming Model","authors":"Zoran Budimlic, K. Knobe","doi":"10.1145/3292533.3292536","DOIUrl":"https://doi.org/10.1145/3292533.3292536","url":null,"abstract":"Application tuning is the one of the major hurdles on the road to exascale computing. Tuning is often directed at a specific architecture or towards some specific tuning goal. As currently practiced, the tuning activity requires serious expertise in the application domain, target architecture and tuning goals. Keeping all these (sometimes conflicting) concerns in mind at the same time while developing a program is very difficult and error prone. Dependence programming is a class of dataflow programming in which both data and control flow are explicit, are distinguished and are given equal weight. This paper gives an overview of CnC, a dataflow dependence programming model, from the perspective of the needs of exascale computing and shows how CnC addresses those needs through a separation of concerns. The goal of CnC is to enable a process that is easier, less error-prone and more effective, by separating these concerns into independent activities. Rather than proposing yet another approach to tuning, CnC provides a way to specify the application in a way that a) hides details not relevant to tuning, b) includes as much detail as possible to support the analysis and tuning process, and c) does not assume any specific architecture, style of tuning or tuning goals. This results in a specification of the application that can be applied to any existing or future architectures and can use any existing or future tuning approach. At the same time it can be more efficient in human time spent on creating the application and more effective in finding the best tuning.","PeriodicalId":195082,"journal":{"name":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128349417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-Driven execution of the Tile LU Decomposition","authors":"George Matheou, C. Kyriacou, P. Evripidou","doi":"10.1145/3292533.3292534","DOIUrl":"https://doi.org/10.1145/3292533.3292534","url":null,"abstract":"The objective of this paper is to analyze, develop and evaluate the tile LU Decomposition using the FREDDO framework. FREDDO is a C++ framework, based on the DDM model of execution, that supports efficient data-driven execution on conventional processors. The performance evaluation shows that FREDDO scales well and tolerates scheduling overheads and memory latencies effectively. The LU implementation is evaluated in both single-node and distributed execution environments. In both cases our framework achieves very good speedups, especially in the larger problem sizes. Particularly, our framework achieves up to 97% of the maximum possible speedup on a single-node and up to 90% of the maximum possible speedup on a 4-node cluster with a total of 128 cores.","PeriodicalId":195082,"journal":{"name":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tree-based Read-only Data Chunks for NVRAM Programming","authors":"Kumud Bhandari, Vivek Sarkar","doi":"10.1145/3292533.3292535","DOIUrl":"https://doi.org/10.1145/3292533.3292535","url":null,"abstract":"As the DRAM technology is fast reaching a scaling threshold, emerging non-volatile, byte-addressable memory (NVRAM) is expected to supplement and eventually replace DRAM. Future computing systems are anticipated to have a large amount of NVRAM, possibly spanning across more than one coherence domain. Furthermore, taking advantage of in-place persistence provided by the NVRAM in future systems requires a strategy to prevent tolerated failures (e.g. power failure) from leaving persistent data in an incoherent state. A fresh look at memory management approaches across the system stack is required to fully utilize future NVRAM. In this paper, we carefully assess the NVRAM-related memory access and management challenges, its implication to application level programming, and examine the suitability of tree-based read-only data chunks to NVRAM programming.","PeriodicalId":195082,"journal":{"name":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116803261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","authors":"","doi":"10.1145/3292533","DOIUrl":"https://doi.org/10.1145/3292533","url":null,"abstract":"","PeriodicalId":195082,"journal":{"name":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115434241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}