Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models

IF 65.3 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Foundations and Trends in Machine Learning Pub Date : 2020-07-24 DOI:10.1109/MLHPCAI4S51975.2020.00013

Sergio Botelho, Ameya Joshi, Biswajit Khara, S. Sarkar, C. Hegde, Santi S. Adavani, B. Ganapathysubramanian

{"title":"Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models","authors":"Sergio Botelho, Ameya Joshi, Biswajit Khara, S. Sarkar, C. Hegde, Santi S. Adavani, B. Ganapathysubramanian","doi":"10.1109/MLHPCAI4S51975.2020.00013","DOIUrl":null,"url":null,"abstract":"Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions (≥ 1024 × 1024).Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are 2–3 × faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.","PeriodicalId":47667,"journal":{"name":"Foundations and Trends in Machine Learning","volume":"12 1","pages":"50-63"},"PeriodicalIF":65.3000,"publicationDate":"2020-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLHPCAI4S51975.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 7

Abstract

Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions (≥ 1024 × 1024).Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are 2–3 × faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.

查看原文本刊更多论文

解决偏微分方程的深度生成模型:用于训练大型无数据模型的分布式计算

科学机器学习(SciML)的最新进展为训练求解复杂偏微分方程(PDEs)的新型神经网络架构提供了可能性。最近报道了几种(几乎没有数据的)方法成功地解决了偏微分方程，其中包括深度前馈网络、生成网络和深度编码器-解码器网络。然而，这些方法的实际采用受到训练这些模型的困难的限制，特别是在大输出分辨率(≥1024 × 1024)下进行预测。在这里，我们报告了一个用于数据并行分布式深度学习的软件框架，该框架解决了在合理的时间内训练这些大型SciML模型以及分配存储需求的双重挑战。我们的框架提供了几个开箱即用的功能，包括(a)独立于进程数量的损失完整性，(b)同步批规范化，以及(c)分布式高阶优化方法。我们展示了该框架在云和HPC集群上的出色可扩展性，并报告了带宽、网络拓扑和裸机与云之间的相互作用。我们采用这种方法来训练迄今为止不可能的大小的生成模型，表明神经PDE求解器可以在实际应用中训练。我们还证明了分布式高阶优化方法比基于随机梯度的方法快2-3倍，并且在更高的批大小下提供最小的收敛漂移。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations and Trends in Machine Learning COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

108.50

自引率

0.00%

发文量

期刊介绍： Each issue of Foundations and Trends® in Machine Learning comprises a monograph of at least 50 pages written by research leaders in the field. We aim to publish monographs that provide an in-depth, self-contained treatment of topics where there have been significant new developments. Typically, this means that the monographs we publish will contain a significant level of mathematical detail (to describe the central methods and/or theory for the topic at hand), and will not eschew these details by simply pointing to existing references. Literature surveys and original research papers do not fall within these aims.