ACM Transactions on Reconfigurable Technology and Systems最新文献

筛选
英文 中文
Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF Data 无线射频数据鲁棒深度学习的硬件加速实时漂移感知
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-09-12 DOI: 10.1145/3563394
Chanaka Ganewattha, Z. Khan, Janne J. Lehtomäki, M. Latva-aho
{"title":"Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF Data","authors":"Chanaka Ganewattha, Z. Khan, Janne J. Lehtomäki, M. Latva-aho","doi":"10.1145/3563394","DOIUrl":"https://doi.org/10.1145/3563394","url":null,"abstract":"Proactive and intelligent management of network resource utilization (RU) using deep learning (DL) can significantly improve the efficiency and performance of the next generation of wireless networks. However, variations in wireless RU are often affected by uncertain events and change points due to the deviations of real data distribution from that of the original training data. Such deviations, which are known as dataset drifts, can subsequently lead to a shift in the corresponding decision boundary degrading the DL model prediction performance. To address these challenges, we present hardware-accelerated real-time radio frequency (RF) analytics and drift-awareness modules for robust DL predictions. We have prototyped the proposed design on a Zynq-7000 System-on-Chip that contains an FPGA and an embedded ARM processor. We have used Xilinx Vivado design suite for synthesis and analysis of the HDL design for the proposed solution. To detect dataset drifts, the proposed solution adopts a distance-based technique on FPGA to quantify in real-time the change between the prediction distribution obtained from DL predictions and data distribution of input streaming samples. Using various performance metrics, we have extensively evaluated the performance of the proposed solution and shown that it can significantly improve the DL model robustness in the presence of dataset drifts.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"16 1","pages":"1 - 29"},"PeriodicalIF":2.3,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42918939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis 高阶综合概率推理决策图的FPGA加速
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-09-06 DOI: 10.1145/3561514
Young-kyu Choi, Carlos Santillana, Yujia Shen, Adnan Darwiche, J. Cong
{"title":"FPGA Acceleration of Probabilistic Sentential Decision Diagrams with High-level Synthesis","authors":"Young-kyu Choi, Carlos Santillana, Yujia Shen, Adnan Darwiche, J. Cong","doi":"10.1145/3561514","DOIUrl":"https://doi.org/10.1145/3561514","url":null,"abstract":"Probabilistic Sentential Decision Diagrams (PSDDs) provide efficient methods for modeling and reasoning with probability distributions in the presence of massive logical constraints. PSDDs can also be synthesized from graphical models such as Bayesian networks (BNs) therefore offering a new set of tools for performing inference on these models (in time linear in the PSDD size). Despite these favorable characteristics of PSDDs, we have found multiple challenges in PSDD’s FPGA acceleration. Problems include limited parallelism, data dependency, and small pipeline iterations. In this article, we propose several optimization techniques to solve these issues with novel pipeline scheduling and parallelization schemes. We designed the PSDD kernel with a high-level synthesis (HLS) tool for ease of implementation and verified it on the Xilinx Alveo U250 board. Experimental results show that our methods improve the baseline FPGA HLS implementation performance by 2,200X and the multicore CPU implementation by 20X. The proposed design also outperforms state-of-the-art BN and Sum Product Network (SPN) accelerators that store the graph information in memory.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"16 1","pages":"1 - 22"},"PeriodicalIF":2.3,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44311674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Jitter-based Adaptive True Random Number Generation Circuits for FPGAs in the Cloud 用于云中FPGA的基于抖动的自适应真随机数生成电路
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-09-05 DOI: 10.1145/3487554
Xiang Li, Peter Stanwicks, George Provelengios, R. Tessier, Daniel E. Holcomb
{"title":"Jitter-based Adaptive True Random Number Generation Circuits for FPGAs in the Cloud","authors":"Xiang Li, Peter Stanwicks, George Provelengios, R. Tessier, Daniel E. Holcomb","doi":"10.1145/3487554","DOIUrl":"https://doi.org/10.1145/3487554","url":null,"abstract":"In this article, we present and evaluate a true random number generator (TRNG) design that is compatible with the restrictions imposed by cloud-based Field Programmable Gate Array (FPGA) providers such as Amazon Web Services (AWS) EC2 F1. Because cloud FPGA providers disallow the ring oscillator circuits that conventionally generate TRNG entropy, our design is oscillator-free and uses clock jitter as its entropy source. The clock jitter is harvested with a time-to-digital converter (TDC) and a controllable delay line that is continuously tuned to compensate for process, voltage, and temperature variations. After describing the design, we present and validate a stochastic model that conservatively quantifies its worst-case entropy. We deploy and model the design in the cloud on 60 EC2 F1 FPGA instances to ensure sufficient randomness is captured. TRNG entropy is further validated using NIST test suites, and experiments are performed to understand how the TRNG responds to on-die power attacks that disturb the FPGA supply voltage in the vicinity of the TRNG. After introducing and validating our basic TRNG design, we introduce and validate a new variant that uses four instances of a linkable sampling module to increase the entropy per sample and improve throughput. The new variant improves throughput by 250% at a modest 17% increase in CLB count.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"16 1","pages":"1 - 20"},"PeriodicalIF":2.3,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49120995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains 利用低开销细粒度功率域提高CGRAs的能效
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-08-27 DOI: 10.1145/3558394
Ankita Nayak, Kecheng Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, Christopher Torng, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina
{"title":"Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains","authors":"Ankita Nayak, Kecheng Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, Christopher Torng, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina","doi":"10.1145/3558394","DOIUrl":"https://doi.org/10.1145/3558394","url":null,"abstract":"To effectively minimize static power for a wide range of applications, power domains for coarse-grained reconfigurable array (CGRA) architectures need to be more fine-grained than those found in a typical application-specific integrated circuit. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that reduces the area overhead of power domain boundary protection from around 9% to less than 1% without incurring any extra timing delay from the isolation cells. Conventional Unified Power Format based flow for power domain boundary protection does not support this design choice. Therefore, we create our own compiler-like passes that iteratively introduce the needed design changes, and formally verify the transformations using methods based on satisfiability modulo theories. These passes also let us optimize how we handle test and debug signals through the off tiles in the CGRA. Using our framework, we add power domains to a CGRA that we designed and taped out. The CGRA has 32 × 16 processing element and memory tiles and 4-MB secondary memory. We address the implementation challenges encountered due to the introduction of fine-grained power domains, including the addressing of the CGRA tiles, the power grid design, well substrate connections, and distribution of global signals. Our CGRA achieves up to 83% reduction in leakage power and 26% reduction in total power versus an identical CGRA without multiple power domains, for a range of image processing and machine learning applications.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":" ","pages":"1 - 28"},"PeriodicalIF":2.3,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48447378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs SASA:一个可扩展的自动模板加速框架,用于优化基于hbm的fpga的混合空间和时间并行性
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-08-23 DOI: 10.1145/3572547
Xingyu Tian, Z. Ye, Alec Lu, Licheng Guo, Yuze Chi, Zhenman Fang Simon Fraser University, University of Electronic Science, Technology of China, U. California, Los Angeles
{"title":"SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs","authors":"Xingyu Tian, Z. Ye, Alec Lu, Licheng Guo, Yuze Chi, Zhenman Fang Simon Fraser University, University of Electronic Science, Technology of China, U. California, Los Angeles","doi":"10.1145/3572547","DOIUrl":"https://doi.org/10.1145/3572547","url":null,"abstract":"Stencil computation is one of the fundamental computing patterns in many application domains such as scientific computing and image processing. While there are promising studies that accelerate stencils on FPGAs, there lacks an automated acceleration framework to systematically explore both spatial and temporal parallelisms for iterative stencils that could be either computation-bound or memory-bound. In this article, we present SASA, a scalable and automatic stencil acceleration framework on modern HBM-based FPGAs. SASA takes the high-level stencil DSL and FPGA platform as inputs, automatically exploits the best spatial and temporal parallelism configuration based on our accurate analytical model, and generates the optimized FPGA design with the best parallelism configuration in TAPA high-level synthesis C++ as well as its corresponding host code. Compared to state-of-the-art automatic stencil acceleration framework SODA that only exploits temporal parallelism, SASA achieves an average speedup of 3.41× and up to 15.73× speedup on the HBM-based Xilinx Alveo U280 FPGA board for a wide range of stencil kernels.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"16 1","pages":"1 - 33"},"PeriodicalIF":2.3,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42262376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Voltage Sensor Implementations for Remote Power Attacks on FPGAs fpga上远程电源攻击的电压传感器实现
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-08-08 DOI: 10.1145/3555048
Shayan Moini, Aleksa Deric, Xiang Li, George Provelengios, W. Burleson, R. Tessier, Daniel E. Holcomb
{"title":"Voltage Sensor Implementations for Remote Power Attacks on FPGAs","authors":"Shayan Moini, Aleksa Deric, Xiang Li, George Provelengios, W. Burleson, R. Tessier, Daniel E. Holcomb","doi":"10.1145/3555048","DOIUrl":"https://doi.org/10.1145/3555048","url":null,"abstract":"This article presents a study of two types of on-chip FPGA voltage sensors based on ring oscillators (ROs) and time-to-digital converter (TDCs), respectively. It has previously been shown that these sensors are often used to extract side-channel information from FPGAs without physical access. The performance of the sensors is evaluated in the presence of circuits that deliberately waste power, resulting in localized voltage drops. The effects of FPGA power supply features and sensor sensitivity in detecting voltage drops in an FPGA power distribution network (PDN) are evaluated for Xilinx Artix-7, Zynq 7000, and Zynq UltraScale+ FPGAs. We show that both sensor types are able to detect supply voltage drops, and that their measurements are consistent with each other. Our findings show that TDC-based sensors are more sensitive and can detect voltage drops that are shorter in duration, while RO sensors are easier to implement because calibration is not required. Furthermore, we present a new time-interleaved TDC design that sweeps the sensor phase. The new sensor generates data that can reconstruct voltage transients on the order of tens of picoseconds.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"16 1","pages":"1 - 21"},"PeriodicalIF":2.3,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43857322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Toward Software-like Debugging for FPGAs via Checkpointing and Transaction-based Co-Simulation 基于检查点和事务协同仿真的FPGA类软件调试
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-08-01 DOI: 10.1145/3552521
Sameh Attia, V. Betz
{"title":"Toward Software-like Debugging for FPGAs via Checkpointing and Transaction-based Co-Simulation","authors":"Sameh Attia, V. Betz","doi":"10.1145/3552521","DOIUrl":"https://doi.org/10.1145/3552521","url":null,"abstract":"Checkpoint-based debugging flows have recently been developed that allow the user to move the design state back and forth between an FPGA and a simulator. They provide a softwarelike debugging experience by combining the speed of hardware execution and the full visibility of simulation. However, they assume the entire system state can be moved to a simulator, limiting them to self-contained systems. In this article, we present StateLink, a transaction-based co-simulation framework that allows part of the system (the task) to run in a simulator and still interact with other system components that reside in hardware. StateLink allows tasks to remain connected to and active in the overall hardware system after their state is moved to a simulator. This extends the functionality of checkpoint-based debugging frameworks to designs with external I/Os and significantly speeds up the simulation of tasks that are part of a large system. StateLink typically adds no timing overhead and a modest hardware area overhead. The total area overhead of using the proposed flow on a Memcached system is only 13%. This flow allows the user to benefit from both the hardware speedup of ∼1M× and the StateLink speedup of up to 44× versus full system simulation.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":" ","pages":"1 - 24"},"PeriodicalIF":2.3,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45709582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Compilation and Mapping of Fixed Function Combinational Logic onto Digital Signal Processors Targeting Neural Network Inference and Utilizing High-level Synthesis 基于神经网络推理和高级综合的固定函数组合逻辑在数字信号处理器上的高效编译和映射
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-07-30 DOI: 10.1145/3559543
Soheil Nazar Shahsavani, A. Fayyazi, M. Nazemi, M. Pedram
{"title":"Efficient Compilation and Mapping of Fixed Function Combinational Logic onto Digital Signal Processors Targeting Neural Network Inference and Utilizing High-level Synthesis","authors":"Soheil Nazar Shahsavani, A. Fayyazi, M. Nazemi, M. Pedram","doi":"10.1145/3559543","DOIUrl":"https://doi.org/10.1145/3559543","url":null,"abstract":"Recent efforts for improving the performance of neural network (NN) accelerators that meet today’s application requirements have given rise to a new trend of logic-based NN inference relying on fixed function combinational logic. Mapping such large Boolean functions with many input variables and product terms to digital signal processors (DSPs) on Field-programmable gate arrays (FPGAs) needs a novel framework considering the structure and reconfigurability of DSP blocks during this process. The proposed methodology in this article maps the fixed function combinational logic blocks to a set of Boolean functions where Boolean operations corresponding to each function are mapped to DSP devices rather than look-up tables on the FPGAs to take advantage of the high performance, low latency, and parallelism of DSP blocks. This article also presents an innovative design and optimization methodology for compilation and mapping of NNs, utilizing fixed function combinational logic to DSPs on FPGAs employing high-level synthesis flow. Our experimental evaluations across several datasets and selected NNs demonstrate the comparable performance of our framework in terms of the inference latency and output accuracy compared to prior art FPGA-based NN accelerators employing DSPs.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"16 1","pages":"1 - 25"},"PeriodicalIF":2.3,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48600753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Scalable Many-core Overlay Architecture on an HBM2-enabled Multi-Die FPGA 支持hbm2的多模FPGA上的可扩展多核覆盖体系结构
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-07-20 DOI: 10.1145/3547657
Riadh Ben Abdelhamid, Y. Yamaguchi, T. Boku
{"title":"A Scalable Many-core Overlay Architecture on an HBM2-enabled Multi-Die FPGA","authors":"Riadh Ben Abdelhamid, Y. Yamaguchi, T. Boku","doi":"10.1145/3547657","DOIUrl":"https://doi.org/10.1145/3547657","url":null,"abstract":"The overlay architecture enables to raise the abstraction level of hardware design and enhances hardware-accelerated applications’ portability. In FPGAs, there is a growing awareness of the overlay structure as typified by many-core architecture. It works in theory; however, it is difficult in practice, because it is beset with serious design issues. For example, the size of FPGAs is bigger than before. It is exacerbating the issue of the place-and-route. Besides, a single FPGA is actually the sum of small-to-middle FPGAs by advancing packaging technology like silicon interposers. Thus, the tightly coupled many-core designs will face this covert issue that the wires among the regions are extremely restricted. This article proposes efficient essential processing elements, micro-architecture design, and the interconnect architecture toward a scalable many-core overlay design. In particular, our work proposes a novel compact buffering technique to reduce memory resource utilization in tightly connected overlays while preserving computational efficiency. This technique reduces the utilization of BlockRAM to nearly 50% while achieving a best-case computational efficiency of 91.93% in a three-dimensional Jacobi benchmark. Besides, the proposed enhancements led to around 2× and 3× improvement in performance and power efficiency, respectively. Moreover, the improved scalability allowed increasing compute resources and delivering around 4× better performance and power efficiency, as compared to the baseline Dynamically Re-programmable Architecture of Gather-scatter Overlay Nodes overlay.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":" ","pages":"1 - 33"},"PeriodicalIF":2.3,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46212356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and Optimizations 近内存计算的fpga与3d堆叠存储器:应用程序,架构和优化
IF 2.3 4区 计算机科学
ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2022-07-18 DOI: 10.1145/3547658
Veronia Iskandar, M. A. E. Ghany, Diana Göhringer
{"title":"Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and Optimizations","authors":"Veronia Iskandar, M. A. E. Ghany, Diana Göhringer","doi":"10.1145/3547658","DOIUrl":"https://doi.org/10.1145/3547658","url":null,"abstract":"The near-memory computing (NMC) paradigm has transpired as a promising method for overcoming the memory wall challenges of future computing architectures. Modern systems integrating 3D-stacked DRAM memory can be leveraged to prevent unnecessary data movement between the main memory and the CPU. FPGA vendors have started introducing 3D memories to their products in an effort to remain competitive on bandwidth requirements of modern memory-intensive applications. Recent NMC proposals target various types of data processing workloads such as graph processing, MapReduce, sorting, machine learning, and database analytics. In this article, we conduct a literature survey on previous proposals of NMC systems on FPGAs integrated with 3D memories. By leveraging the high bandwidth offered from such memories together with specifically designed hardware, FPGA architectures have become a competitor to GPU solutions in terms of speed and energy efficiency. Various FPGA-based NMC designs have been proposed with software and hardware optimization methods to achieve high performance and energy efficiency. Our review investigates various aspects of NMC designs such as platforms, architectures, workloads, and tools. We identify the key challenges and open issues with future research directions.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":" ","pages":"1 - 32"},"PeriodicalIF":2.3,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45842961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信