{"title":"SwInt: A Non-Blocking Switch-Based Silicon Photonic Interposer Network for 2.5D Machine Learning Accelerators","authors":"Ebadollah Taheri;Mohammad Amin Mahdian;Sudeep Pasricha;Mahdi Nikdast","doi":"10.1109/JETCAS.2024.3429354","DOIUrl":null,"url":null,"abstract":"The surging demand for machine learning (ML) applications has emphasized the pressing need for efficient ML accelerators capable of addressing the computational and energy demands of increasingly complex ML models. However, the conventional monolithic design of large-scale ML accelerators on a single chip often entails prohibitively high fabrication costs. To address this challenge, this paper proposes a 2.5D chiplet-based architecture based on a silicon photonic interposer, called SwInt, to enable high bandwidth, low latency, and energy-efficient data movement on the interposer, for ML applications. Existing silicon photonic interposer implementations suffer from high power consumption attributed to their inefficient network designs, primarily relying on bus-based communication. Bus-based communication is not scalable, as it suffers from high power consumption of the optical laser due to cumulative losses on the readers and writers when the bandwidth per waveguide (i.e., wavelength division multiplexing degree) increases or the number of processing elements in ML accelerators scales up. SwInt incorporates a novel switch-based network designed using Mach-Zehnder Interferometer (MZI)-based switch cells for offering scalable interposer communication and reducing power consumption. The designed switch architecture avoids blocking using an efficient design, while minimizing the number of stages to offer a low-loss switch. Furthermore, the MZI switch cells are designed with a dividing state, enabling energy-efficient broadcast communication over the interposer and supporting broadcasting demand in ML accelerators. Additionally, we optimized and fabricated silicon photonic devices, Microring Resonators (MRRs) and MZIs, which are integral components of our network architecture. Our analysis shows that SwInt achieves, on average, 62% and 64% improvement in power consumption under, respectively, unicast and broadcast communication, resulting in 59.7% energy-efficiency improvement compared to the state-of-the-art silicon photonic interposers specifically designed for ML accelerators.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10599539","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10599539/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The surging demand for machine learning (ML) applications has emphasized the pressing need for efficient ML accelerators capable of addressing the computational and energy demands of increasingly complex ML models. However, the conventional monolithic design of large-scale ML accelerators on a single chip often entails prohibitively high fabrication costs. To address this challenge, this paper proposes a 2.5D chiplet-based architecture based on a silicon photonic interposer, called SwInt, to enable high bandwidth, low latency, and energy-efficient data movement on the interposer, for ML applications. Existing silicon photonic interposer implementations suffer from high power consumption attributed to their inefficient network designs, primarily relying on bus-based communication. Bus-based communication is not scalable, as it suffers from high power consumption of the optical laser due to cumulative losses on the readers and writers when the bandwidth per waveguide (i.e., wavelength division multiplexing degree) increases or the number of processing elements in ML accelerators scales up. SwInt incorporates a novel switch-based network designed using Mach-Zehnder Interferometer (MZI)-based switch cells for offering scalable interposer communication and reducing power consumption. The designed switch architecture avoids blocking using an efficient design, while minimizing the number of stages to offer a low-loss switch. Furthermore, the MZI switch cells are designed with a dividing state, enabling energy-efficient broadcast communication over the interposer and supporting broadcasting demand in ML accelerators. Additionally, we optimized and fabricated silicon photonic devices, Microring Resonators (MRRs) and MZIs, which are integral components of our network architecture. Our analysis shows that SwInt achieves, on average, 62% and 64% improvement in power consumption under, respectively, unicast and broadcast communication, resulting in 59.7% energy-efficiency improvement compared to the state-of-the-art silicon photonic interposers specifically designed for ML accelerators.
机器学习(ML)应用需求的激增,凸显了对高效 ML 加速器的迫切需求,这种加速器必须能够满足日益复杂的 ML 模型的计算和能源需求。然而,在单个芯片上进行大规模 ML 加速器的传统单片设计往往需要高昂的制造成本。为了应对这一挑战,本文提出了一种基于硅光子插层的 2.5D 芯片架构(称为 SwInt),可在插层上实现高带宽、低延迟和高能效的数据移动,适用于 ML 应用。现有的硅光子插层实施方案主要依赖基于总线的通信,其低效的网络设计导致功耗居高不下。基于总线的通信不可扩展,因为当每个波导的带宽(即波分复用度)增加或 ML 加速器的处理元件数量增加时,读写器上的累积损耗会导致光激光器的高功耗。SwInt 采用了一种基于开关的新型网络,使用基于马赫-泽恩德干涉仪(MZI)的开关单元进行设计,以提供可扩展的层间通信并降低功耗。所设计的开关架构采用高效设计,避免了阻塞,同时最大限度地减少了级数,从而实现了低损耗开关。此外,MZI 交换单元还设计了一种分割状态,从而实现了高能效的插接器广播通信,并支持 ML 加速器中的广播需求。此外,我们还优化并制造了硅光子器件、微oring 谐振器 (MRR) 和 MZI,它们是我们网络架构不可或缺的组成部分。我们的分析表明,与专为 ML 加速器设计的最先进的硅光子中间件相比,SwInt 在单播和广播通信下的功耗平均分别提高了 62% 和 64%,能效提高了 59.7%。
期刊介绍:
The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.