Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support

Yue Zha, J. Li
{"title":"Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support","authors":"Yue Zha, J. Li","doi":"10.1145/3173162.3173167","DOIUrl":null,"url":null,"abstract":"With the recent trend of promoting Field-Programmable Gate Arrays (FPGAs) to first-class citizens in accelerating compute-intensive applications in networking, cloud services and artificial intelligence, FPGAs face two major challenges in sustaining competitive advantages in performance and energy efficiency for diverse cloud workloads: 1) limited configuration capability for supporting light-weight computations/on-chip data storage to accelerate emerging search-/data-intensive applications. 2) lack of architectural support to hide reconfiguration overhead for assisting virtualization in a cloud computing environment. In this paper, we propose a reconfigurable memory-oriented computing fabric, namely Liquid Silicon-Monona (L-Si), enabled by emerging nonvolatile memory technology i.e. RRAM, to address these two challenges. Specifically, L-Si addresses the first challenge by virtue of a new architecture comprising a 2D array of physically identical but functionally-configurable building blocks. It, for the first time, extends the configuration capabilities of existing FPGAs from computation to the whole spectrum ranging from computation to data storage. It allows users to better customize hardware by flexibly partitioning hardware resources between computation and memory, greatly benefiting emerging search- and data-intensive applications. To address the second challenge, L-Si provides scalable multi-context architectural support to minimize reconfiguration overhead for assisting virtualization. In addition, we provide compiler support to facilitate the programming of applications written in high-level programming languages (e.g. OpenCL) and frameworks (e.g. TensorFlow, MapReduce) while fully exploiting the unique architectural capability of L-Si. Our evaluation results show L-Si achieves 99.6% area reduction, 1.43× throughput improvement and 94.0% power reduction on search-intensive benchmarks, as compared with the FPGA baseline. For neural network benchmarks, on average, L-Si achieves 52.3× speedup, 113.9× energy reduction and 81% area reduction over the FPGA baseline. In addition, the multi-context architecture of L-Si reduces the context switching time to - 10ns, compared with an off-the-shelf FPGA (∼100ms), greatly facilitating virtualization.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173162.3173167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

With the recent trend of promoting Field-Programmable Gate Arrays (FPGAs) to first-class citizens in accelerating compute-intensive applications in networking, cloud services and artificial intelligence, FPGAs face two major challenges in sustaining competitive advantages in performance and energy efficiency for diverse cloud workloads: 1) limited configuration capability for supporting light-weight computations/on-chip data storage to accelerate emerging search-/data-intensive applications. 2) lack of architectural support to hide reconfiguration overhead for assisting virtualization in a cloud computing environment. In this paper, we propose a reconfigurable memory-oriented computing fabric, namely Liquid Silicon-Monona (L-Si), enabled by emerging nonvolatile memory technology i.e. RRAM, to address these two challenges. Specifically, L-Si addresses the first challenge by virtue of a new architecture comprising a 2D array of physically identical but functionally-configurable building blocks. It, for the first time, extends the configuration capabilities of existing FPGAs from computation to the whole spectrum ranging from computation to data storage. It allows users to better customize hardware by flexibly partitioning hardware resources between computation and memory, greatly benefiting emerging search- and data-intensive applications. To address the second challenge, L-Si provides scalable multi-context architectural support to minimize reconfiguration overhead for assisting virtualization. In addition, we provide compiler support to facilitate the programming of applications written in high-level programming languages (e.g. OpenCL) and frameworks (e.g. TensorFlow, MapReduce) while fully exploiting the unique architectural capability of L-Si. Our evaluation results show L-Si achieves 99.6% area reduction, 1.43× throughput improvement and 94.0% power reduction on search-intensive benchmarks, as compared with the FPGA baseline. For neural network benchmarks, on average, L-Si achieves 52.3× speedup, 113.9× energy reduction and 81% area reduction over the FPGA baseline. In addition, the multi-context architecture of L-Si reduces the context switching time to - 10ns, compared with an off-the-shelf FPGA (∼100ms), greatly facilitating virtualization.
液态硅- monona:具有可扩展多上下文支持的可重构面向内存的计算结构
随着现场可编程门阵列(fpga)在加速网络、云服务和人工智能中的计算密集型应用方面向一流用户推广的最新趋势,fpga在保持不同云工作负载在性能和能效方面的竞争优势方面面临两个主要挑战:1)支持轻量级计算/片上数据存储以加速新兴搜索/数据密集型应用的配置能力有限。2)缺乏架构支持来隐藏在云计算环境中协助虚拟化的重新配置开销。在本文中,我们提出了一种可重构的面向存储器的计算结构,即液态硅-单晶硅(L-Si),它由新兴的非易失性存储器技术(即RRAM)实现,以解决这两个挑战。具体来说,L-Si解决了第一个挑战,因为它采用了一种新的架构,包括物理上相同但功能上可配置的构建块的二维阵列。它首次将现有fpga的配置能力从计算扩展到从计算到数据存储的整个范围。它允许用户通过灵活地在计算和内存之间划分硬件资源来更好地定制硬件,极大地有利于新兴的搜索和数据密集型应用程序。为了解决第二个挑战,L-Si提供了可伸缩的多上下文体系结构支持,以最大限度地减少辅助虚拟化的重新配置开销。此外,我们提供编译器支持,以促进用高级编程语言(例如OpenCL)和框架(例如TensorFlow, MapReduce)编写的应用程序的编程,同时充分利用L-Si的独特架构能力。我们的评估结果表明,与FPGA基线相比,L-Si在搜索密集型基准测试中实现了99.6%的面积减少,1.43倍的吞吐量提高和94.0%的功耗降低。对于神经网络基准测试,L-Si在FPGA基准上平均实现了52.3倍的加速,113.9倍的能耗和81%的面积减少。此外,与现成的FPGA (~ 100ms)相比,L-Si的多上下文架构将上下文切换时间缩短至- 10ns,极大地促进了虚拟化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信