{"title":"百亿亿次计算机上数据流处理的挑战与机遇","authors":"J. Wozniak, M. Wilde, Ian T Foster","doi":"10.1145/3292533.3292537","DOIUrl":null,"url":null,"abstract":"Computational applications critical to society in areas such as materials design, climate modeling, and energy production and distribution must be developed quickly and correctly. Such studies are typically done via computational experiments, in which large numbers of simulation and analysis tasks are strung together into a workflow, formally or informally. Many innovative programming models to handle workflow specifications are based on an implicitly parallel dataflow language to support this model. Here, we posit that the dataflow model has the added benefit that it can exploit features of exascale computers, expected in ∼2023 [21]. Workflows and other outermost patterns such as parameter sweeps, searches, and optimizations can easily be expressed with dataflow languages. In this model, either statically or at runtime a dataflow structure is available to the runtime system, presenting the opportunity for many types of automated decisions for scheduling and resource management. Workflow applications on these systems will also introduce new requirements, such as high-performance data movement methods for in situ data analysis, and new challenges, such as varying reliability characterstics. In this paper, we describe three key exascale feature areas that will be exposed to users at the application level. First, a more complex storage hierarchy is expected. New cache types such as scratchpad memory may be available, and heterogeneous RAM systems may have differing performance and reliability characteristics. Node-local storage may be available. These systems are expected to be available to the application or middleware via advanced operating system and runtime features. Second, tighter power budgets and programmer-controlled power scaling will likely be available. These will require the application level to make decisions about performance/power tradeoffs. Third, a more complex","PeriodicalId":195082,"journal":{"name":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Challenges and Opportunities for Dataflow Processing on Exascale Computers\",\"authors\":\"J. Wozniak, M. Wilde, Ian T Foster\",\"doi\":\"10.1145/3292533.3292537\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational applications critical to society in areas such as materials design, climate modeling, and energy production and distribution must be developed quickly and correctly. Such studies are typically done via computational experiments, in which large numbers of simulation and analysis tasks are strung together into a workflow, formally or informally. Many innovative programming models to handle workflow specifications are based on an implicitly parallel dataflow language to support this model. Here, we posit that the dataflow model has the added benefit that it can exploit features of exascale computers, expected in ∼2023 [21]. Workflows and other outermost patterns such as parameter sweeps, searches, and optimizations can easily be expressed with dataflow languages. In this model, either statically or at runtime a dataflow structure is available to the runtime system, presenting the opportunity for many types of automated decisions for scheduling and resource management. Workflow applications on these systems will also introduce new requirements, such as high-performance data movement methods for in situ data analysis, and new challenges, such as varying reliability characterstics. In this paper, we describe three key exascale feature areas that will be exposed to users at the application level. First, a more complex storage hierarchy is expected. New cache types such as scratchpad memory may be available, and heterogeneous RAM systems may have differing performance and reliability characteristics. Node-local storage may be available. These systems are expected to be available to the application or middleware via advanced operating system and runtime features. Second, tighter power budgets and programmer-controlled power scaling will likely be available. These will require the application level to make decisions about performance/power tradeoffs. Third, a more complex\",\"PeriodicalId\":195082,\"journal\":{\"name\":\"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3292533.3292537\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292533.3292537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Challenges and Opportunities for Dataflow Processing on Exascale Computers
Computational applications critical to society in areas such as materials design, climate modeling, and energy production and distribution must be developed quickly and correctly. Such studies are typically done via computational experiments, in which large numbers of simulation and analysis tasks are strung together into a workflow, formally or informally. Many innovative programming models to handle workflow specifications are based on an implicitly parallel dataflow language to support this model. Here, we posit that the dataflow model has the added benefit that it can exploit features of exascale computers, expected in ∼2023 [21]. Workflows and other outermost patterns such as parameter sweeps, searches, and optimizations can easily be expressed with dataflow languages. In this model, either statically or at runtime a dataflow structure is available to the runtime system, presenting the opportunity for many types of automated decisions for scheduling and resource management. Workflow applications on these systems will also introduce new requirements, such as high-performance data movement methods for in situ data analysis, and new challenges, such as varying reliability characterstics. In this paper, we describe three key exascale feature areas that will be exposed to users at the application level. First, a more complex storage hierarchy is expected. New cache types such as scratchpad memory may be available, and heterogeneous RAM systems may have differing performance and reliability characteristics. Node-local storage may be available. These systems are expected to be available to the application or middleware via advanced operating system and runtime features. Second, tighter power budgets and programmer-controlled power scaling will likely be available. These will require the application level to make decisions about performance/power tradeoffs. Third, a more complex