Towards energy-efficient scientific computing: Reversible numerical linear algebra kernels in floating-point arithmetic

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Sustainable Computing-Informatics & Systems Pub Date : 2026-01-01 Epub Date: 2025-12-20 DOI:10.1016/j.suscom.2025.101261

V. Dwarka

{"title":"Towards energy-efficient scientific computing: Reversible numerical linear algebra kernels in floating-point arithmetic","authors":"V. Dwarka","doi":"10.1016/j.suscom.2025.101261","DOIUrl":null,"url":null,"abstract":"<div><div>Frontier scientific and AI workloads now reach <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>19</mn></mrow></msup><mspace></mspace><mo>−</mo><mspace></mspace><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>25</mn></mrow></msup></mrow></math></span> fused multiply–add (FMA) operations per run (on the order of <span><math><mrow><mn>2</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>19</mn></mrow></msup><mspace></mspace><mo>−</mo><mspace></mspace><mn>2</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>25</mn></mrow></msup></mrow></math></span> FLOPs). At today’s <span><math><mrow><mo>∼</mo><mn>10</mn></mrow></math></span> pJ per FMA, this corresponds to approximately <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>8</mn></mrow></msup><mspace></mspace><mo>−</mo><mspace></mspace><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>14</mn></mrow></msup></mrow></math></span> joules of arithmetic energy. At this scale, energy becomes the limiting resource for continued growth in computational workloads, motivating a re-evaluation of long-standing algorithmic assumptions. It is often assumed that reversible computing only matters near the Landauer limit. Building on prior physical arguments that full energy recovery is only possible when computation preserves information, we demonstrate that this same requirement governs floating-point numerical kernels: overwriting state enforces a non-zero energy floor, even under ideal recovery. Thus, eliminating this wall in practice requires that the numerical algorithm itself be injective. We therefore present the <em>first</em> reversible floating-point realizations of core dense numerical kernels—matrix multiplication, LU factorization, and conjugate-gradient iteration—that retain rounding information rather than discarding it. Implemented directly in IEEE arithmetic, they achieve machine-precision forward–reverse agreement on well- and ill-conditioned problems with minimal auxiliary state. A toggle-based model with measured switching costs and realistic recovery factors predicts <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup><mspace></mspace><mo>−</mo><mspace></mspace><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>4</mn></mrow></msup><mo>×</mo></mrow></math></span> reductions in arithmetic energy. These results establish injective floating-point kernels as a foundation for energy-recovering numerical computation, and indicate that realizing this potential will require sustained co-design across applied mathematics, computer science, and hardware engineering.</div></div>","PeriodicalId":48686,"journal":{"name":"Sustainable Computing-Informatics & Systems","volume":"49 ","pages":"Article 101261"},"PeriodicalIF":5.7000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sustainable Computing-Informatics & Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210537925001829","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Frontier scientific and AI workloads now reach

1 0^{19} - 1 0^{25}

fused multiply–add (FMA) operations per run (on the order of

2 \times 1 0^{19} - 2 \times 1 0^{25}

FLOPs). At today’s

\sim 10

pJ per FMA, this corresponds to approximately

1 0^{8} - 1 0^{14}

joules of arithmetic energy. At this scale, energy becomes the limiting resource for continued growth in computational workloads, motivating a re-evaluation of long-standing algorithmic assumptions. It is often assumed that reversible computing only matters near the Landauer limit. Building on prior physical arguments that full energy recovery is only possible when computation preserves information, we demonstrate that this same requirement governs floating-point numerical kernels: overwriting state enforces a non-zero energy floor, even under ideal recovery. Thus, eliminating this wall in practice requires that the numerical algorithm itself be injective. We therefore present the first reversible floating-point realizations of core dense numerical kernels—matrix multiplication, LU factorization, and conjugate-gradient iteration—that retain rounding information rather than discarding it. Implemented directly in IEEE arithmetic, they achieve machine-precision forward–reverse agreement on well- and ill-conditioned problems with minimal auxiliary state. A toggle-based model with measured switching costs and realistic recovery factors predicts

1 0^{3} - 1 0^{4} \times

reductions in arithmetic energy. These results establish injective floating-point kernels as a foundation for energy-recovering numerical computation, and indicate that realizing this potential will require sustained co-design across applied mathematics, computer science, and hardware engineering.

查看原文本刊更多论文

迈向节能科学计算：浮点运算中的可逆数值线性代数核

前沿科学和人工智能工作负载现在达到每次运行1019−1025次融合乘加（FMA）运算（顺序为2×1019−2×1025 FLOPs）。在今天的~ 10 pJ / FMA下，这相当于大约108−1014焦耳的算术能量。在这种规模下，能源成为计算工作量持续增长的限制资源，促使人们对长期存在的算法假设进行重新评估。通常假设可逆计算只在兰道尔极限附近起作用。基于先前的物理论据，即只有在计算保留信息时才有可能完全恢复能量，我们证明了浮点数值核也有同样的要求：即使在理想的恢复情况下，覆盖状态也会强制实现非零能量底限。因此，在实践中消除这堵墙需要数值算法本身是内射的。因此，我们提出了核心密集数值核的第一个可逆浮点实现-矩阵乘法，LU分解和共轭梯度迭代-保留舍入信息而不是丢弃它。它们直接在IEEE算法中实现，以最小的辅助状态实现对良好和病态问题的机器精度的正反向一致。一个基于开关的模型与测量的开关成本和现实的恢复因子预测103 - 104倍的算术能量降低。这些结果确立了注入浮点核作为能量回收数值计算的基础，并表明实现这一潜力将需要应用数学、计算机科学和硬件工程之间持续的协同设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Sustainable Computing-Informatics & Systems COMPUTER SCIENCE, HARDWARE & ARCHITECTUREC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

10.70

自引率

4.40%

发文量

142

期刊介绍： Sustainable computing is a rapidly expanding research area spanning the fields of computer science and engineering, electrical engineering as well as other engineering disciplines. The aim of Sustainable Computing: Informatics and Systems (SUSCOM) is to publish the myriad research findings related to energy-aware and thermal-aware management of computing resource. Equally important is a spectrum of related research issues such as applications of computing that can have ecological and societal impacts. SUSCOM publishes original and timely research papers and survey articles in current areas of power, energy, temperature, and environment related research areas of current importance to readers. SUSCOM has an editorial board comprising prominent researchers from around the world and selects competitively evaluated peer-reviewed papers.