用于大型语言模型训练和推理的光连接多栈HBM模块

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters Pub Date : 2025-02-18 DOI:10.1109/LCA.2025.3540058

Yanghui Ou;Hengrui Zhang;Austin Rovinski;David Wentzlaff;Christopher Batten

{"title":"用于大型语言模型训练和推理的光连接多栈HBM模块","authors":"Yanghui Ou;Hengrui Zhang;Austin Rovinski;David Wentzlaff;Christopher Batten","doi":"10.1109/LCA.2025.3540058","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"49-52"},"PeriodicalIF":1.4000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference\",\"authors\":\"Yanghui Ou;Hengrui Zhang;Austin Rovinski;David Wentzlaff;Christopher Batten\",\"doi\":\"10.1109/LCA.2025.3540058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.\",\"PeriodicalId\":51248,\"journal\":{\"name\":\"IEEE Computer Architecture Letters\",\"volume\":\"24 1\",\"pages\":\"49-52\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Architecture Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10892009/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10892009/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）的规模呈指数级增长，对传统的内存体系结构提出了重大挑战。当前的高带宽存储器（HBM）系统受到芯片I/O带宽的限制，并且由于封装限制，可以集成的HBM堆栈数量有限。在这封信中，我们提出了一种新的存储系统架构，利用硅光子互连来增加计算设备的存储容量和带宽。通过引入光连接的多层HBM模块，我们将HBM存储系统扩展到计算芯片之外，显著增加了HBM堆栈的数量。我们的评估表明，与建模的A100基线相比，该架构可以将万亿参数模型的训练效率提高1.4倍，同时如果修改L2以提供足够的带宽，则还可以将推理性能提高4.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference

Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.