SCONNA:一种基于随机计算的超快速、高效推理整量化cnn的光加速器

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-02-14 DOI:10.1109/IPDPS54959.2023.00061

Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan G. Thakkar, S. A. Salehi, J. Hastings

{"title":"SCONNA:一种基于随机计算的超快速、高效推理整量化cnn的光加速器","authors":"Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan G. Thakkar, S. A. Salehi, J. Hastings","doi":"10.1109/IPDPS54959.2023.00061","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) are used extensively for artificial intelligence applications due to their record-breaking accuracy. For efficient and swift hardware-based acceleration, CNNs are typically quantized to have integer input/weight parameters. The acceleration of a CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size, which severely restricts their achievable VDP operation size for the quantized input/weight precision of 4 bits and higher. The restricted VDP operation size ultimately suppresses computing throughput to severely diminish the achievable performance benefits. To address this shortcoming, we for the first time present a merger of stochastic computing and MRR-based CNN accelerators. To leverage the innate precision flexibility of stochastic computing, we invent an MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a cascaded manner using dense wavelength division multiplexing, to forge a novel Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA achieves significantly high throughput and energy efficiency for accelerating inferences of high-precision quantized CNNs. Our evaluation for the inference of four modern CNNs at 8-bit input/weight precision indicates that SCONNA provides improvements of up to 66.5×, 90× and 91× in frames-per-second (FPS), FPS/W and FPS/W/mm2 respectively, on average over two photonic MRR-based analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to 0.4% for large CNNs and up to 1.5% for small CNNs. We developed a transaction-level, event-driven python-based simulator for the evaluation of SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs\",\"authors\":\"Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan G. Thakkar, S. A. Salehi, J. Hastings\",\"doi\":\"10.1109/IPDPS54959.2023.00061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) are used extensively for artificial intelligence applications due to their record-breaking accuracy. For efficient and swift hardware-based acceleration, CNNs are typically quantized to have integer input/weight parameters. The acceleration of a CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size, which severely restricts their achievable VDP operation size for the quantized input/weight precision of 4 bits and higher. The restricted VDP operation size ultimately suppresses computing throughput to severely diminish the achievable performance benefits. To address this shortcoming, we for the first time present a merger of stochastic computing and MRR-based CNN accelerators. To leverage the innate precision flexibility of stochastic computing, we invent an MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a cascaded manner using dense wavelength division multiplexing, to forge a novel Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA achieves significantly high throughput and energy efficiency for accelerating inferences of high-precision quantized CNNs. Our evaluation for the inference of four modern CNNs at 8-bit input/weight precision indicates that SCONNA provides improvements of up to 66.5×, 90× and 91× in frames-per-second (FPS), FPS/W and FPS/W/mm2 respectively, on average over two photonic MRR-based analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to 0.4% for large CNNs and up to 1.5% for small CNNs. We developed a transaction-level, event-driven python-based simulator for the evaluation of SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).\",\"PeriodicalId\":343684,\"journal\":{\"name\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"147 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS54959.2023.00061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

卷积神经网络(cnn)由于其破纪录的准确性而被广泛用于人工智能应用。为了高效和快速的基于硬件的加速，cnn通常被量化为具有整数输入/权重参数。CNN推理任务的加速使用卷积运算，卷积运算通常被转换为向量点积(VDP)运算。几种基于光子微环谐振器(mrr)的硬件架构已经被提出，以加速整数量化cnn，与它们的电子同行相比，具有显着更高的吞吐量和能量效率。然而，现有的基于光子核磁共振的模拟加速器在可实现的输入/重量精度和VDP运算大小之间表现出非常强的权衡，这严重限制了它们在量化输入/重量精度为4位或更高的情况下可实现的VDP运算大小。受限的VDP操作大小最终会抑制计算吞吐量，从而严重降低可实现的性能效益。为了解决这个缺点，我们首次提出了随机计算和基于核磁共振的CNN加速器的合并。为了利用随机计算固有的精确灵活性，我们发明了一种基于磁共振的光学随机乘法器(OSM)。我们采用密集波分复用，以级联的方式使用多个osm，构建了一种基于随机计算的新型光学神经网络加速器(SCONNA)。SCONNA为高精度量化cnn的加速推理实现了显著的高吞吐量和高能效。我们对四个8位输入/权重精度的现代CNN的推断进行了评估，结果表明，与之前的工作相比，SCONNA在帧数每秒(FPS)、帧数/W和帧数/W/mm2方面分别提供了高达66.5倍、90倍和91倍的改进，其中大型CNN的前1精度仅下降0.4%，小型CNN的前1精度下降1.5%。我们开发了一个事务级、事件驱动的基于python的模拟器，用于评估SCONNA和其他加速器(https://github.com/uky-UCAT/SC_ONN_SIM.git)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs

Convolutional Neural Networks (CNNs) are used extensively for artificial intelligence applications due to their record-breaking accuracy. For efficient and swift hardware-based acceleration, CNNs are typically quantized to have integer input/weight parameters. The acceleration of a CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size, which severely restricts their achievable VDP operation size for the quantized input/weight precision of 4 bits and higher. The restricted VDP operation size ultimately suppresses computing throughput to severely diminish the achievable performance benefits. To address this shortcoming, we for the first time present a merger of stochastic computing and MRR-based CNN accelerators. To leverage the innate precision flexibility of stochastic computing, we invent an MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a cascaded manner using dense wavelength division multiplexing, to forge a novel Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA achieves significantly high throughput and energy efficiency for accelerating inferences of high-precision quantized CNNs. Our evaluation for the inference of four modern CNNs at 8-bit input/weight precision indicates that SCONNA provides improvements of up to 66.5×, 90× and 91× in frames-per-second (FPS), FPS/W and FPS/W/mm2 respectively, on average over two photonic MRR-based analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to 0.4% for large CNNs and up to 1.5% for small CNNs. We developed a transaction-level, event-driven python-based simulator for the evaluation of SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量