T. Ikeda, S. Ito, H. Nagao, T. Katagiri, Toru Nagai, M. Ogino
{"title":"Optimizing Forward Computation in Adjoint Method via Multi-level Blocking","authors":"T. Ikeda, S. Ito, H. Nagao, T. Katagiri, Toru Nagai, M. Ogino","doi":"10.1145/3149457.3149458","DOIUrl":null,"url":null,"abstract":"Data assimilation (DA) is a computational technique that integrates large-scale numerical simulations with observed data, and the adjoint method is classified as a non-sequential DA technique. The target model for the simulations in this paper is the phase-field model, which is often used to simulate the temporal evolution of the internal structures of materials. Since the phase-field method computes a continuous field, a naïve implementation of the adjoint method requires an enormous amount of computation time. One reason for the increase in computation time is that the amount of data required for simulations is much larger than the cache capacity of computers. To reduce memory access and achieve better performance, it is necessary to use computational blocking, which involves reusing data within the cache as much as possible. In this paper, we propose multi-level blocking to optimize forward computation in the adjoint method. The proposed multi-level blocking consists of spatial blocking, temporal blocking, and the blocking of multiple forward computations in the adjoint method. We investigated the effectiveness of the proposed multi-level blocking on the Fujitsu PRIMEHPC FX100 supercomputer. By applying spatial and temporal blocking, we attained a speed-up of 1.89 x in execution time without blocking and that of 1.48 x as the upper limit by applying blocking to multiple forward computations (MFB). We also attained a speed-up of 1.13 by applying multi-level blocking to execution time without blocking.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"9 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3149458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data assimilation (DA) is a computational technique that integrates large-scale numerical simulations with observed data, and the adjoint method is classified as a non-sequential DA technique. The target model for the simulations in this paper is the phase-field model, which is often used to simulate the temporal evolution of the internal structures of materials. Since the phase-field method computes a continuous field, a naïve implementation of the adjoint method requires an enormous amount of computation time. One reason for the increase in computation time is that the amount of data required for simulations is much larger than the cache capacity of computers. To reduce memory access and achieve better performance, it is necessary to use computational blocking, which involves reusing data within the cache as much as possible. In this paper, we propose multi-level blocking to optimize forward computation in the adjoint method. The proposed multi-level blocking consists of spatial blocking, temporal blocking, and the blocking of multiple forward computations in the adjoint method. We investigated the effectiveness of the proposed multi-level blocking on the Fujitsu PRIMEHPC FX100 supercomputer. By applying spatial and temporal blocking, we attained a speed-up of 1.89 x in execution time without blocking and that of 1.48 x as the upper limit by applying blocking to multiple forward computations (MFB). We also attained a speed-up of 1.13 by applying multi-level blocking to execution time without blocking.