OPIMA: Optical Processing-in-Memory for Convolutional Neural Network Acceleration

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3446870

Febin Sunny;Amin Shafiee;Abhishek Balasubramaniam;Mahdi Nikdast;Sudeep Pasricha

{"title":"OPIMA: Optical Processing-in-Memory for Convolutional Neural Network Acceleration","authors":"Febin Sunny;Amin Shafiee;Abhishek Balasubramaniam;Mahdi Nikdast;Sudeep Pasricha","doi":"10.1109/TCAD.2024.3446870","DOIUrl":null,"url":null,"abstract":"Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, dynamic random-access memory-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. In addition, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve \n<inline-formula> <tex-math>$2.98\\times $ </tex-math></inline-formula>\n higher throughput and \n<inline-formula> <tex-math>$137\\times $ </tex-math></inline-formula>\n better energy efficiency than the best known prior work.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3888-3899"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745860/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, dynamic random-access memory-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. In addition, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve

$2.98\times $

higher throughput and

$137\times $

better energy efficiency than the best known prior work.

查看原文本刊更多论文

OPIMA：用于卷积神经网络加速的光学内存处理技术

机器学习（ML）领域的最新进展凸显了人们对能够弥合内存带宽与处理能力之间差距的计算架构的迫切需求。深度神经网络的出现将传统的冯-诺依曼架构推向了极限，因为在这些工作负载中，处理器和内存之间的数据移动会带来高延迟和能耗成本。克服这一瓶颈的解决方案之一是通过内存中处理（PIM）在主内存中执行计算，从而限制数据移动和相关成本。然而，由于内部数据移动瓶颈和频繁刷新操作的需要，基于动态随机存取内存的 PIM 难以实现高吞吐量和高能效。在这项工作中，我们介绍了基于 PIM 的 ML 加速器 OPIMA，它架构在光学主存储器内。OPIMA 的设计目的是利用主存储器固有的大规模并行性，同时执行高速、低能耗的光学计算，以加速基于卷积神经网络的 ML 模型。我们对 OPIMA 进行了全面分析，以指导设计选择和运行机制。此外，我们还评估了 OPIMA 的性能和能耗，并将其与传统电子计算系统和新兴光子 PIM 架构进行了比较。实验结果表明，OPIMA的吞吐量比之前已知的最佳研究成果高出2.98倍，能效则高出137倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.