ACM Transactions on Reconfigurable Technology and Systems最新文献_第4页

Covert-channels in FPGA-enabled SmartSSDs 支持fpga的smartssd中的转换通道

IF 2.3 4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-12-04 DOI: 10.1145/3635312

Theodoros Trochatos, Anthony Etim, Jakub Szefer

{"title":"Covert-channels in FPGA-enabled SmartSSDs","authors":"Theodoros Trochatos, Anthony Etim, Jakub Szefer","doi":"10.1145/3635312","DOIUrl":"https://doi.org/10.1145/3635312","url":null,"abstract":"Cloud computing providers today offer access to a variety of devices, which users can rent and access remotely in a shared setting. Among these devices are SmartSSDs, which a solid-state disks (SSD) augmented with an FPGA, enabling users to instantiate custom circuits within the FPGA, including potentially malicious circuits for power and temperature measurement. Normally, cloud users have no remote access to power and temperature data, but with SmartSSDs they could abuse the FPGA component to instantiate circuits to learn this information. Additionally, custom power waster circuits can be instantiated within the FPGA. This paper shows for the first time that by leveraging ring oscillator sensors and power wasters, numerous covert-channels in FPGA-enabled SmartSSDs could be used to transmit information. This work presents two channels in single-tenant setting (SmartSSD is used by one user at a time) and two channels in multi-tenant setting (FPGA and SSD inside SmartSSD is shared by different users). The presented covert channels can reach close to 100% accuracy. Meanwhile, bandwidth of the channels can be easily scaled by cloud users renting more SmartSSDs as the bandwidth of the covert channels is proportional to number of SmartSSD used.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"64 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138541630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs 跨越时间和空间:Senju在单个和多个fpga上缩放迭代模板循环加速器的方法

IF 2.3 4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-11-29 DOI: 10.1145/3634920

Emanuele Del Sozzo, Davide Conficconi, Kentaro Sano

{"title":"Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs","authors":"Emanuele Del Sozzo, Davide Conficconi, Kentaro Sano","doi":"10.1145/3634920","DOIUrl":"https://doi.org/10.1145/3634920","url":null,"abstract":"Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context, Iterative Stencil Loops (ISLs) represent a prominent and well-known algorithmic class within the stencil domain. Specifically, ISL-based calculations iteratively apply the same stencil to a multi-dimensional point grid multiple times or until convergence. However, due to their iterative and intensive nature, ISLs are highly performance-hungry, demanding specialized solutions. Here, Field Programmable Gate Arrays (FPGAs) represent a valid architectural choice as they enable the design of custom, parallel, and scalable ISL accelerators. Besides, the regular structure of ISLs makes them an ideal candidate for automatic optimization and generation flows. For these reasons, this paper introduces Senju, an automation framework for the design of highly parallel ISL accelerators targeting single-/multi-FPGA systems. Given an input description, Senju automates the entire design process and provides accurate performance estimations. The experimental evaluation shows remarkable and scalable results, outperforming single- and multi-FPGA literature approaches under different metrics. Finally, we present a new analysis of temporal and spatial parallelism trade-offs in a real-case scenario and discuss our performance through a single- and novel specialized multi-FPGA formulation of the Roofline Model.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"42 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Malicious Potential of Xilinx’ Internal Configuration Access Port (ICAP) Xilinx内部配置访问端口(ICAP)的恶意漏洞分析

IF 2.3 4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-11-17 DOI: 10.1145/3633204

Nils Albartus, Maik Ender, Jan-Niklas Möller, Marc Fyrbiak, Christof Paar, Russell Tessier

{"title":"On the Malicious Potential of Xilinx’ Internal Configuration Access Port (ICAP)","authors":"Nils Albartus, Maik Ender, Jan-Niklas Möller, Marc Fyrbiak, Christof Paar, Russell Tessier","doi":"10.1145/3633204","DOIUrl":"https://doi.org/10.1145/3633204","url":null,"abstract":"FPGAs have become increasingly popular in computing platforms. With recent advances in bitstream format reverse engineering, the scientific community has widely explored static FPGA security threats. For example, it is now possible to convert a bitstream to a netlist, revealing design information, and apply modifications to the static bitstream based on this knowledge. However, a systematic study of the influence of the bitstream format understanding in regards to the security aspects of the dynamic configuration process, particularly for Xilinx’s Internal Configuration Access Port (ICAP), is lacking. This paper fills this gap by comprehensively analyzing the security implications of ICAP interfaces, which primarily support dynamic partial reconfiguration. We delve into the Xilinx bitstream file format, identify misconceptions in official documentation, and propose novel configuration (attack) primitives based on dynamic reconfiguration, i.e., create/read/update/delete circuits in the FPGA, without requiring pre-definition during the design phase. Our primitives are consolidated in a novel Stealthy Reconfigurable Adaptive Trojan (<monospace>STRAT</monospace>) framework to conceal Trojans and evade state-of-the-art netlist reverse engineering methods. As FPGAs become integral to modern cloud computing, this research presents crucial insights on potential security risks, including the possibility of a malicious tenant or provider altering or spying on another tenant’s configuration undetected.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"83 4","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks 二值神经网络硬件效率的量化与优化

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-11-07 DOI: 10.1145/3631610

Geng Yang, Jie Lei, Zhenman Fang, Yunsong li, Jiaqing Zhang, Weiying Xie

{"title":"HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks","authors":"Geng Yang, Jie Lei, Zhenman Fang, Yunsong li, Jiaqing Zhang, Weiying Xie","doi":"10.1145/3631610","DOIUrl":"https://doi.org/10.1145/3631610","url":null,"abstract":"Binary neural network (BNN), where both the weight and the activation values are represented with one bit, provides an attractive alternative to deploy highly efficient deep learning inference on resource-constrained edge devices. However, our investigation reveals that, to achieve satisfactory accuracy gains, state-of-the-art (SOTA) BNNs, such as FracBNN and ReActNet, usually have to incorporate various auxiliary floating-point components and increase the model size, which in turn degrades the hardware performance efficiency. In this paper, we aim to quantify such hardware inefficiency in SOTA BNNs and further mitigate it with negligible accuracy loss. First, we observe that the auxiliary floating-point (AFP) components consume an average of 93% DSPs, 46% LUTs, and 62% FFs, among the entire BNN accelerator resource utilization. To mitigate such overhead, we propose a novel algorithm-hardware co-design, called FuseBNN , to fuse those AFP operators without hurting the accuracy. On average, FuseBNN reduces AFP resource utilization to 59% DSPs, 13% LUTs, and 16% FFs. Second, SOTA BNNs often use the compact MobileNetV1 as the backbone network but have to replace the lightweight 3 × 3 depth-wise convolution (DWC) with the 3 × 3 standard convolution (SC, e.g., in ReActNet and our ReActNet-adapted BaseBNN) or even more complex fractional 3 × 3 SC (e.g., in FracBNN) to bridge the accuracy gap. As a result, the model parameter size is significantly increased and becomes 2.25 × larger than that of the 4-bit direct quantization with the original DWC (4-Bit-Net); the number of multiply-accumulate operations is also significantly increased so that the overall LUT resource usage of BaseBNN is almost the same as that of 4-Bit-Net. To address this issue, we propose HyBNN , where we binarize depth-wise separation convolution (DSC) blocks for the first time to decrease the model size and incorporate 4-bit DSC blocks to compensate for the accuracy loss. For the ship detection task in synthetic aperture radar imagery on the AMD-Xilinx ZCU102 FPGA, HyBNN achieves a detection accuracy of 94.8% and a detection speed of 615 frames per second (FPS), which is 6.8 × faster than FuseBNN+ (94.9% accuracy) and 2.7 × faster than 4-Bit-Net (95.9% accuracy). For image classification on the CIFAR-10 dataset on the AMD-Xilinx Ultra96-V2 FPGA, HyBNN achieves 1.5 × speedup and 0.7% better accuracy over SOTA FracBNN.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"279 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135475095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Eciton: Very Low-Power Recurrent Neural Network Accelerator for Real-Time Inference at the Edge 用于边缘实时推理的极低功耗递归神经网络加速器

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-11-01 DOI: 10.1145/3629979

Jeffrey Chen, Sang-Woo Jun, Sehwan Hong, Warrick He, Jinyeong Moon

{"title":"Eciton: Very Low-Power Recurrent Neural Network Accelerator for Real-Time Inference at the Edge","authors":"Jeffrey Chen, Sang-Woo Jun, Sehwan Hong, Warrick He, Jinyeong Moon","doi":"10.1145/3629979","DOIUrl":"https://doi.org/10.1145/3629979","url":null,"abstract":"This paper presents Eciton, a very low-power recurrent neural network accelerator for time series data within low-power edge sensor nodes, achieving real-time inference with a power consumption of 17 mW under load. Eciton reduces memory and chip resource requirements via 8-bit quantization and hard sigmoid activation, allowing the accelerator as well as the recurrent neural network model parameters to fit in a low-cost, low-power Lattice iCE40 UP5K FPGA. We evaluate Eciton on multiple, established time-series classification applications including predictive maintenance of mechanical systems, sound classification, and intrusion detection for IoT nodes. Binary and multi-class classification edge models are explored, demonstrating that Eciton can adapt to a variety of deployable environments and remote use cases. Eciton demonstrates real-time processing at a very low power consumption with minimal loss of accuracy on multiple inference scenarios with differing characteristics, while achieving competitive power efficiency against the state-of-the-art of similar scale. We show that the addition of this accelerator actually reduces the power budget of the sensor node by reducing power-hungry wireless transmission. The resulting power budget of the sensor node is small enough to be powered by a power harvester, potentially allowing it to run indefinitely without a battery or periodic maintenance.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"174 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135371182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Designing Deep Learning Models on FPGA with Multiple Heterogeneous Engines 基于FPGA的多异构引擎深度学习模型设计

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-10-10 DOI: 10.1145/3615870

Miguel Reis, Mário Véstias, Horácio Neto

{"title":"Designing Deep Learning Models on FPGA with Multiple Heterogeneous Engines","authors":"Miguel Reis, Mário Véstias, Horácio Neto","doi":"10.1145/3615870","DOIUrl":"https://doi.org/10.1145/3615870","url":null,"abstract":"Deep learning models are becoming more complex and heterogeneous with new layer types to improve their accuracy. This brings a considerable challenge to the designers of accelerators of deep neural networks. There have been several architectures and design flows to map deep learning models on hardware, but they are limited to a particular model and/or layer types. Also, the architectures generated by these tools target, in general, high-performance devices, not appropriate for embedded computing. This paper proposes a multi-engine architecture and a design flow to implement deep learning models on FPGA. The hardware design uses high-level synthesis to allow design space exploration. The architecture is scalable and therefore applicable to any density FPGAs. The architecture and design flow were applied to the development of a hardware/software system for image classification with ResNet50, object detection with YOLOv3-Tiny and image segmentation with Deeplabv3+. The system was tested in a low-density Zynq UltraScale+ ZU3EG FPGA to show its scalability. The results show that the proposed multi-engine architecture generates efficient accelerators. An accelerator of ResNet50 with a 4-bit quantization achieves 67 FPS, and the object detector with YOLOv3-Tiny with a throughput of 36 FPS and the image segmentation application achieves 1.4 FPS.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Strega : An HTTP Server for FPGAs Strega:用于fpga的HTTP服务器

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-10-10 DOI: 10.1145/3611312

Fabio Maschi, Gustavo Alonso

{"title":"<scp>Strega</scp> : An HTTP Server for FPGAs","authors":"Fabio Maschi, Gustavo Alonso","doi":"10.1145/3611312","DOIUrl":"https://doi.org/10.1145/3611312","url":null,"abstract":"The computer architecture landscape is being reshaped by the new opportunities, challenges and constraints brought by the cloud. On the one hand, high-level applications profit from specialised hardware to boost their performance and reduce deployment costs. On the other hand, cloud providers maximise the CPU time allocated to client applications by offloading infrastructure tasks to hardware accelerators. While it is well understood how to do this for, e.g., network function virtualisation and protocols such as TCP/IP, support for higher networking layers is still largely missing, limiting the potential of accelerators. In this paper, we present S trega , an open-source 1 light-weight HTTP server that enables crucial functionality such as FPGA-accelerated functions being called through a RESTful protocol (FPGA-as-a-Function). Our experimental analysis shows that a single S trega node sustains a throughput of 1.7 M HTTP requests per second with an end-to-end latency as low as 16 μ s, outperforming nginx running on 32 vCPUs in both metrics, and can even be an alternative to the traditional OpenCL flow over the PCIe bus. Through this work, we pave the way for running microservices directly on FPGAs, bypassing CPU overhead and realising the full potential of FPGA acceleration in distributed cloud applications.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136295278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reprogrammable non-linear circuits using ReRAM for NN accelerators 使用ReRAM用于神经网络加速器的可编程非线性电路

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-10-10 DOI: 10.1145/3617894

Rafael Fão de Moura, Luigi Carro

{"title":"Reprogrammable non-linear circuits using ReRAM for NN accelerators","authors":"Rafael Fão de Moura, Luigi Carro","doi":"10.1145/3617894","DOIUrl":"https://doi.org/10.1145/3617894","url":null,"abstract":"As the massive usage of Artificial Intelligence (AI) techniques spreads in the economy, researchers are exploring new techniques to reduce the energy consumption of Neural Network (NN) applications, especially as the complexity of NNs continues to increase. Using analog Resistive RAM (ReRAM) devices to compute Matrix-Vector Multiplication (MVM) in O (1) time complexity is a promising approach, but it’s true that these implementations often fail to cover the diversity of nonlinearities required for modern NN applications. In this work, we propose a novel approach where ReRAMs themselves can be reprogrammed to compute not only the required matrix multiplications, but also the activation functions, softmax, and pooling layers, reducing energy in complex NNs. This approach offers more versatility for researching novel NN layouts compared to custom logic. Results show that our device outperforms analog and digital Field Programmable approaches by up to 8.5x in experiments on real-world human activity recognition and language modeling datasets with Convolutional Neural Networks (CNNs), Generative Pre-trained Transformer (GPT), and Long Short-Term Memory (LSTM) models.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136295688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FDRA: A Framework for Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism FDRA:一个支持多级并行的动态可重构加速器框架

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-10-10 DOI: 10.1145/3614224

Yunhui Qiu, Yiqing Mao, Xuchen Gao, Sichao Chen, Jiangnan Li, Wenbo Yin, Lingli Wang

{"title":"FDRA: A Framework for Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism","authors":"Yunhui Qiu, Yiqing Mao, Xuchen Gao, Sichao Chen, Jiangnan Li, Wenbo Yin, Lingli Wang","doi":"10.1145/3614224","DOIUrl":"https://doi.org/10.1145/3614224","url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open-source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the accelerator instruction pipelining to overlap data communication, computation, and configuration across multiple tasks. In this paper, we propose FDRA, an open-source exploration framework for a heterogeneous system-on-chip (SoC) with a RISC-V processor and a dynamically reconfigurable accelerator (DRA) supporting loop, instruction, and task levels of parallelism. FDRA encompasses parameterized SoC modeling, Verilog generation, source-to-source application code transformation using frontend and DRA compilers, SoC simulation, and FPGA prototyping. FDRA incorporates the extraction of periodic accumulative operators and multidimensional linear load/store operators from nested loops. The DRA enables accessing the shared L2 cache with virtual addresses and supports direct memory access (DMA) with arbitrary start addresses and data lengths. Integrated into the RISC-V Rocket SoC, our DRA achieves a remarkable 55 × acceleration for loop kernels and improves energy efficiency by 29 ×. Compared to state-of-the-art RISC-V vector units, our DRA demonstrates a 2.9 × speed improvement and 3.5 × greater energy efficiency. In contrast to previous CGRA+RISC-V SoCs, our SoC achieves a minimum speedup of 5.2 ×.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136295683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constraint-Aware Multi-Technique Approximate High-Level Synthesis for FPGAs fpga约束感知多技术近似高级综合

4区计算机科学

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-10-09 DOI: 10.1145/3624481

Marcos T. Leipnitz, Gabriel L. Nazar

{"title":"Constraint-Aware Multi-Technique Approximate High-Level Synthesis for FPGAs","authors":"Marcos T. Leipnitz, Gabriel L. Nazar","doi":"10.1145/3624481","DOIUrl":"https://doi.org/10.1145/3624481","url":null,"abstract":"Numerous approximate computing (AC) techniques have been developed to reduce the design costs in error-resilient application domains, such as signal and multimedia processing, data mining, machine learning, and computer vision, to trade-off computation accuracy with area and power savings or performance improvements. Selecting adequate techniques for each application and optimization target is complex but crucial for high-quality results. In this context, Approximate High-Level Synthesis (AHLS) tools have been proposed to alleviate the burden of hand-crafting approximate circuits by automating the exploitation of AC techniques. However, such tools are typically tied to a specific approximation technique or a difficult-to-extend set of techniques whose exploitation is not fully automated or steered by optimization targets. Therefore, available AHLS tools overlook the benefits of expanding the design space by mixing diverse approximation techniques toward meeting specific design objectives with minimum error. In this work, we propose an AHLS design methodology for FPGAs that automatically identifies efficient combinations of multiple approximation techniques for different applications and design constraints. Compared to single-technique approaches, decreases of up to 30% in mean squared error and absolute increases of up to 6.5% in percentage accuracy were obtained for a set of image, video, signal processing and machine learning benchmarks.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0