Memory-Efficient Adjoints via Graph Partitioning

2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2022-06-22 DOI:10.1109/jcsse54890.2022.9836288

Ekkapot Charoenwanit

{"title":"Memory-Efficient Adjoints via Graph Partitioning","authors":"Ekkapot Charoenwanit","doi":"10.1109/jcsse54890.2022.9836288","DOIUrl":null,"url":null,"abstract":"Derivative information plays a crucial role in the correctness and performance of scientific computing in a wide variety of scientific domains such as computational fluid dynamics (CFD), finance engineering and so on. The reverse mode of Algorithmic Differentiation is particularly efficient for the computation of derivatives of multivariate vector functions $F: R^{n}\\mapsto R^{m}$, where the number of inputs $n$ far exceeds the number of outputs $m$, frequently appearing as cost functions in numerical optimization kernels. In particular, reverse-mode AD is at the heart of the back propagation algorithm widely used in machine learning. The reverse mode of AD requires that the control flow of the derivative program be reversed, meaning that the results of intermediate computations (in our case, the computational graph of $F$) must be stored either in memory or secondary storage. As a result, this requirement leads to the memory wall problem, especially for large-scale numerical problems, where the results of intermediate computations cannot fit entirely in memory. In this paper, we present an algorithm called Memory-Efficient Adjoints (ME-Adjoints) for solving the memory wall problem by dynamically applying a simple partitioning scheme to the computational graph of the function $F$ at runtime. Our approach employs operator overloading in C++ to achieve a fully automatic adjoining process, whereby derivative programs require only trivial changes to the code as opposed to the use of checkpointing techniques, which require substantial changes to the code.","PeriodicalId":284735,"journal":{"name":"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/jcsse54890.2022.9836288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Derivative information plays a crucial role in the correctness and performance of scientific computing in a wide variety of scientific domains such as computational fluid dynamics (CFD), finance engineering and so on. The reverse mode of Algorithmic Differentiation is particularly efficient for the computation of derivatives of multivariate vector functions $F: R^{n}\mapsto R^{m}$, where the number of inputs $n$ far exceeds the number of outputs $m$, frequently appearing as cost functions in numerical optimization kernels. In particular, reverse-mode AD is at the heart of the back propagation algorithm widely used in machine learning. The reverse mode of AD requires that the control flow of the derivative program be reversed, meaning that the results of intermediate computations (in our case, the computational graph of $F$) must be stored either in memory or secondary storage. As a result, this requirement leads to the memory wall problem, especially for large-scale numerical problems, where the results of intermediate computations cannot fit entirely in memory. In this paper, we present an algorithm called Memory-Efficient Adjoints (ME-Adjoints) for solving the memory wall problem by dynamically applying a simple partitioning scheme to the computational graph of the function $F$ at runtime. Our approach employs operator overloading in C++ to achieve a fully automatic adjoining process, whereby derivative programs require only trivial changes to the code as opposed to the use of checkpointing techniques, which require substantial changes to the code.

查看原文本刊更多论文

基于图分区的内存效率共轭

在计算流体力学(CFD)、金融工程等众多科学领域中，导数信息对科学计算的正确性和性能起着至关重要的作用。算法微分的反向模式对于计算多元向量函数F: R^{n}\映射到R^{m}$的导数特别有效，其中输入$n$的数量远远超过输出$m$的数量，经常作为代价函数出现在数值优化核中。特别是，反向模式AD是广泛应用于机器学习的反向传播算法的核心。AD的反向模式要求派生程序的控制流是反向的，这意味着中间计算的结果(在我们的例子中是$F$的计算图)必须存储在内存或二级存储器中。因此，这种要求导致了内存墙问题，特别是对于大规模数值问题，中间计算的结果不能完全装入内存。在本文中，我们提出了一种称为内存高效共轭(me - adjoint)的算法，该算法通过在运行时对函数$F$的计算图动态应用一个简单的划分方案来解决内存墙问题。我们的方法在c++中使用操作符重载来实现完全自动化的相邻过程，由此派生程序只需要对代码进行微小的更改，而不是使用检查点技术，这需要对代码进行大量更改。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量