Chiplet-Gym: Optimizing Chiplet-Based AI Accelerator Design With Reinforcement Learning

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2024-09-11 DOI:10.1109/TC.2024.3457740

Kaniz Mishty;Mehdi Sadi

{"title":"Chiplet-Gym: Optimizing Chiplet-Based AI Accelerator Design With Reinforcement Learning","authors":"Kaniz Mishty;Mehdi Sadi","doi":"10.1109/TC.2024.3457740","DOIUrl":null,"url":null,"abstract":"Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures have gained significant attention in the AI hardware domain. However, the vast design space of chiplet-based AI accelerator design and the absence of system and package-level co-design methodology make it difficult for the designer to find the optimum design point regarding Power, Performance, Area, and manufacturing Cost (PPAC). This paper presents Chiplet-Gym, a Reinforcement Learning (RL)-based optimization framework to explore the vast design space of chiplet-based AI accelerators, encompassing the resource allocation, placement, and packaging architecture. We analytically model the PPAC of the chiplet-based AI accelerator and integrate it into an OpenAI gym environment to evaluate the design points. We also explore non-RL-based optimization approaches and combine these two approaches to ensure the robustness of the optimizer. The optimizer-suggested design point achieves \n<inline-formula><tex-math>$1.52\\boldsymbol{\\times}$</tex-math></inline-formula>\n throughput, \n<inline-formula><tex-math>$0.27\\boldsymbol{\\times}$</tex-math></inline-formula>\n energy, and \n<inline-formula><tex-math>$0.89\\boldsymbol{\\times}$</tex-math></inline-formula>\n cost of its monolithic counterpart at iso-area.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"43-56"},"PeriodicalIF":3.6000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10677458/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures have gained significant attention in the AI hardware domain. However, the vast design space of chiplet-based AI accelerator design and the absence of system and package-level co-design methodology make it difficult for the designer to find the optimum design point regarding Power, Performance, Area, and manufacturing Cost (PPAC). This paper presents Chiplet-Gym, a Reinforcement Learning (RL)-based optimization framework to explore the vast design space of chiplet-based AI accelerators, encompassing the resource allocation, placement, and packaging architecture. We analytically model the PPAC of the chiplet-based AI accelerator and integrate it into an OpenAI gym environment to evaluate the design points. We also explore non-RL-based optimization approaches and combine these two approaches to ensure the robustness of the optimizer. The optimizer-suggested design point achieves

$1.52\boldsymbol{\times}$

throughput,

$0.27\boldsymbol{\times}$

energy, and

$0.89\boldsymbol{\times}$

cost of its monolithic counterpart at iso-area.

查看原文本刊更多论文

Chiplet-Gym：利用强化学习优化基于 Chiplet 的人工智能加速器设计

现代人工智能（AI）工作负载需要具有大硅面积的计算系统来维持吞吐量和竞争性能。然而，高昂的制造成本和先进技术节点的良率限制以及达到十字线极限的模具尺寸限制了我们实现这一目标。随着近年来先进封装技术的创新，基于芯片的架构在人工智能硬件领域受到了极大的关注。然而，基于芯片的人工智能加速器设计的巨大设计空间以及系统级和封装级协同设计方法的缺乏，使得设计人员难以找到关于功率、性能、面积和制造成本（PPAC）的最佳设计点。本文介绍了基于强化学习（RL）的优化框架Chiplet-Gym，用于探索基于芯片的AI加速器的巨大设计空间，包括资源分配，放置和包装架构。我们对基于芯片的AI加速器的PPAC进行了分析建模，并将其集成到OpenAI健身房环境中，以评估设计点。我们还探索了非基于强化学习的优化方法，并将这两种方法结合起来以确保优化器的鲁棒性。优化器建议的设计点在等面积下实现了1.52美元的吞吐量，0.27美元的能量和0.89美元的单片成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.