{"title":"时间相关问题的灵敏度分析:最优检查点-重新计算HPC工作流","authors":"V. Carey, H. Abbasi, I. Rodero, H. Kolla","doi":"10.1109/WORKS.2014.15","DOIUrl":null,"url":null,"abstract":"Sensitivity analysis (SA) is a fundamental tool of uncertainty quantification(UQ). Adjoint-based SA is the optimal approach in many large-scale applications, such as the direct numerical simulation (DNS) of combustion. However, one of the challenges of the adjoint workflow for time-dependent applications is the storage and I/O requirements for the application state. During the time-reversal portion of the workflow, forward state is required in last-in-first-out order. The resulting requirements for storage at exascale are enormous. To mitigate this requirement, application state is regenerated from checkpoints over short windows of application time. This approach drastically reduces the total volume of stored data, allows the caching of state in the regeneration window in memory and on local SSDs, may accelerate the application execution by reducing output frequency, and reduces the power overhead from I/O. We explore variations to this workflow, applied to a proxy for the SA of turbulent combustion, by varying checkpoint number, state storage, and other regeneration options to find efficient implementations for minimizing compute time or power consumption.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sensitivity Analysis for Time Dependent Problems: Optimal Checkpoint-Recompute HPC Workflows\",\"authors\":\"V. Carey, H. Abbasi, I. Rodero, H. Kolla\",\"doi\":\"10.1109/WORKS.2014.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensitivity analysis (SA) is a fundamental tool of uncertainty quantification(UQ). Adjoint-based SA is the optimal approach in many large-scale applications, such as the direct numerical simulation (DNS) of combustion. However, one of the challenges of the adjoint workflow for time-dependent applications is the storage and I/O requirements for the application state. During the time-reversal portion of the workflow, forward state is required in last-in-first-out order. The resulting requirements for storage at exascale are enormous. To mitigate this requirement, application state is regenerated from checkpoints over short windows of application time. This approach drastically reduces the total volume of stored data, allows the caching of state in the regeneration window in memory and on local SSDs, may accelerate the application execution by reducing output frequency, and reduces the power overhead from I/O. We explore variations to this workflow, applied to a proxy for the SA of turbulent combustion, by varying checkpoint number, state storage, and other regeneration options to find efficient implementations for minimizing compute time or power consumption.\",\"PeriodicalId\":206005,\"journal\":{\"name\":\"2014 9th Workshop on Workflows in Support of Large-Scale Science\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 9th Workshop on Workflows in Support of Large-Scale Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WORKS.2014.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 9th Workshop on Workflows in Support of Large-Scale Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WORKS.2014.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sensitivity Analysis for Time Dependent Problems: Optimal Checkpoint-Recompute HPC Workflows
Sensitivity analysis (SA) is a fundamental tool of uncertainty quantification(UQ). Adjoint-based SA is the optimal approach in many large-scale applications, such as the direct numerical simulation (DNS) of combustion. However, one of the challenges of the adjoint workflow for time-dependent applications is the storage and I/O requirements for the application state. During the time-reversal portion of the workflow, forward state is required in last-in-first-out order. The resulting requirements for storage at exascale are enormous. To mitigate this requirement, application state is regenerated from checkpoints over short windows of application time. This approach drastically reduces the total volume of stored data, allows the caching of state in the regeneration window in memory and on local SSDs, may accelerate the application execution by reducing output frequency, and reduces the power overhead from I/O. We explore variations to this workflow, applied to a proxy for the SA of turbulent combustion, by varying checkpoint number, state storage, and other regeneration options to find efficient implementations for minimizing compute time or power consumption.