Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels

arXiv - CS - Performance Pub Date : 2024-07-08 DOI:arxiv-2407.06134

Oluwaseun Adewunmi Alo, Sairam Sri Vatsavai, Ishan Thakkar

{"title":"Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels","authors":"Oluwaseun Adewunmi Alo, Sairam Sri Vatsavai, Ishan Thakkar","doi":"arxiv-2407.06134","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply\n(GEMM) kernels, which are often accelerated using specialized hardware\narchitectures. Recently, analog photonic GEMM accelerators have emerged as a\npromising alternative, offering vastly superior speed and energy efficiency\ncompared to traditional electronic accelerators. However, these photonic cannot\nsupport wider than 4-bit integer operands due to their inherent trade-offs\nbetween analog dynamic range and parallelism. This is often inadequate for DNN\ntraining as at least 8-bit wide operands are deemed necessary to prevent\nsignificant accuracy drops. To address these limitations, we introduce a\nscalable photonic GEMM accelerator named SPOGA. SPOGA utilizes enhanced\nfeatures such as analog summation of homodyne optical signals and\nin-transduction positional weighting of operands. By employing an extended\noptical-analog dataflow that minimizes overheads associated with bit-sliced\ninteger arithmetic, SPOGA supports byte-size integer GEMM kernels, achieving\nsignificant improvements in throughput, latency, and energy efficiency.\nSpecifically, SPOGA demonstrates up to 14.4$\\times$, 2$\\times$, and\n28.5$\\times$ improvements in frames-per-second (FPS), FPS/Watt, and\nFPS/Watt/mm$^2$ respectively, compared to existing state-of-the-art photonic\nsolutions.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply (GEMM) kernels, which are often accelerated using specialized hardware architectures. Recently, analog photonic GEMM accelerators have emerged as a promising alternative, offering vastly superior speed and energy efficiency compared to traditional electronic accelerators. However, these photonic cannot support wider than 4-bit integer operands due to their inherent trade-offs between analog dynamic range and parallelism. This is often inadequate for DNN training as at least 8-bit wide operands are deemed necessary to prevent significant accuracy drops. To address these limitations, we introduce a scalable photonic GEMM accelerator named SPOGA. SPOGA utilizes enhanced features such as analog summation of homodyne optical signals and in-transduction positional weighting of operands. By employing an extended optical-analog dataflow that minimizes overheads associated with bit-sliced integer arithmetic, SPOGA supports byte-size integer GEMM kernels, achieving significant improvements in throughput, latency, and energy efficiency. Specifically, SPOGA demonstrates up to 14.4$\times$, 2$\times$, and 28.5$\times$ improvements in frames-per-second (FPS), FPS/Watt, and FPS/Watt/mm$^2$ respectively, compared to existing state-of-the-art photonic solutions.

查看原文本刊更多论文

为字节级整数通用矩阵乘法 (GEMM) 内核扩展模拟光子加速器

深度神经网络（DNN）主要依赖通用矩阵乘法（GEMM）内核，通常使用专用硬件架构对其进行加速。最近，模拟光子 GEMM 加速器作为一种令人兴奋的替代方案出现，其速度和能效大大优于传统的电子加速器。然而，由于模拟动态范围和并行性之间的固有权衡，这些光子加速器无法支持超过 4 位的整数操作数。这往往不能满足 DNN 训练的需要，因为至少需要 8 位宽的操作数才能防止精度大幅下降。为了解决这些局限性，我们推出了名为 SPOGA 的可扩展光子 GEMM 加速器。SPOGA 利用同调光信号的模拟求和以及操作数的传导位置加权等增强功能。SPOGA 采用扩展的光模拟数据流，最大限度地减少了与位切片整数运算相关的开销，从而支持字节大小的整数 GEMM 内核，在吞吐量、延迟和能效方面实现了显著提高。具体来说，与现有的最先进的光子解决方案相比，SPOGA 在每秒帧数（FPS）、FPS/瓦特和 FPS/Watt/mm$^2$ 方面分别实现了高达 14.4 倍、2 倍和 28.5 倍的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量