{"title":"Introduction to the Special Section on Security in FPGA-accelerated Cloud and Datacenters","authors":"C. Bobda, R. Tessier, Ken Eguro, R. Kastner","doi":"10.1145/3352060","DOIUrl":"https://doi.org/10.1145/3352060","url":null,"abstract":"The rapid advance of cloud computing platforms has made these computing resources a vital infrastructure for many application developers. For a modest fee, scalable and diverse compute components are available to application developers on demand, eliminating the need for large computer hardware investments by organizations. Although cloud computing platforms have included microprocessors and graphics processing units (GPUs) for many years, the availability of field-programmable gate arrays (FPGAs) in these platforms has only become prevalent in the past few years. Amazon, Baidu, and Maxeler now expose FPGAs to application developers in their cloud infrastructures. The integration of FPGAs in Microsoft Catapult to accelerate various tasks, including Bing, has led to a 2× performance speed-up versus processor-only implementation with only a 30% increase in energy. Intel recently announced in-package FPGA integration in Xeon multi-core processors. The use of FPGAs in the cloud raises a series of important security issues regarding their use, since the devices have traditionally been used by a single user in a closed environment. FPGA use by many cloud users over time, and potentially by multiple independent users at the same time, opens up a number of attack vectors in which FPGA devices could be damaged, computation could be attacked leading to incorrect results, computation results could be snooped, or covert communication channels in the FPGA could be developed. Research identifying these issues and developing countermeasures is still at an early stage, although interest is increasing. To highlight early work in these areas, we initiated a call for a special issue of TRETS with the topic of security in FPGA-accelerated cloud and datacenters. After an initial evaluation of submissions for the special issue for quality and relevance, eight papers were selected for review. At least three reviewers who are experts in security and FPGAs evaluated each manuscript during multiple rounds of review. Ultimately, four high-quality papers related to the security of FPGAs in the cloud were selected for inclusion in the special issue. The following summaries provide high-level views of these manuscripts and a brief analysis of their contributions:","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129865307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Ahmed, Shuze Zhao, James Meijers, O. Trescases, Vaughn Betz
{"title":"FRoC 2.0","authors":"Ibrahim Ahmed, Shuze Zhao, James Meijers, O. Trescases, Vaughn Betz","doi":"10.1145/3354188","DOIUrl":"https://doi.org/10.1145/3354188","url":null,"abstract":"In earlier technology nodes, FPGAs had low power consumption compared to other compute chips such as CPUs and GPUs. However, in the 14nm technology node, FPGAs are consuming unprecedented power in the 100+W range, making power consumption a pressing concern. To reduce FPGA power consumption, several researchers have proposed deploying dynamic voltage scaling. While the previously proposed solutions show promising results, they have difficulty guaranteeing safe operation at reduced voltages for applications that use the FPGA hard blocks. In this work, we present the first DVS solution that is able to fully handle FPGA applications that use BRAMs. Our solution not only robustly tests the soft logic component of the application but also tests all components connected to the BRAMs. We extend a previously proposed CAD tool, FRoC, to automatically generate calibration bitstreams that are used to measure the application’s critical path delays on silicon. The calibration bitstreams also include testers that ensure all used SRAM cells operate safely while scaling Vdd. We experimentally show that using our DVS solution we can save 32% of the total power consumed by a discrete Fourier transform application running with the fixed nominal supply voltage and clocked at the Fmax reported by static timing analysis.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"11 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123682825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs","authors":"Muhsen Owaida, Amit Kulkarni, G. Alonso","doi":"10.1145/3340263","DOIUrl":"https://doi.org/10.1145/3340263","url":null,"abstract":"Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in machine learning often require a cluster of accelerators to make a relevant difference in performance. In this article, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The article explores several levels of the problem: (1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs, (2) the data partitioning and distribution strategies maximizing performance, (3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116904398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen Tridgell, M. Kumm, M. Hardieck, D. Boland, Duncan J. M. Moss, P. Zipf, Philip H. W. Leong
{"title":"Unrolling Ternary Neural Networks","authors":"Stephen Tridgell, M. Kumm, M. Hardieck, D. Boland, Duncan J. M. Moss, P. Zipf, Philip H. W. Leong","doi":"10.1145/3359983","DOIUrl":"https://doi.org/10.1145/3359983","url":null,"abstract":"The computational complexity of neural networks for large-scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This article demonstrates, for the case where the neural network architecture and ternary weight values are known a priori, that extremely high throughput implementations of neural network inference can be made by customising the datapath and routing to remove unnecessary computations and data movement. This approach is ideally suited to FPGA implementations as a specialized implementation of a trained network improves efficiency while still retaining generality with the reconfigurability of an FPGA. A VGG-style network with ternary weights and fixed point activations is implemented for the CIFAR10 dataset on Amazon’s AWS F1 instance. This article demonstrates how to remove 90% of the operations in convolutional layers by exploiting sparsity and compile-time optimizations. The implementation in hardware achieves 90.9 ± 0.1% accuracy and 122k frames per second, with a latency of only 29µs, which is the fastest CNN inference implementation reported so far on an FPGA.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"411 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126690342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilias Giechaskiel, Ken Eguro, Kasper Bonne Rasmussen
{"title":"Leakier Wires","authors":"Ilias Giechaskiel, Ken Eguro, Kasper Bonne Rasmussen","doi":"10.1145/3322483","DOIUrl":"https://doi.org/10.1145/3322483","url":null,"abstract":"In complex FPGA designs, implementations of algorithms and protocols from third-party sources are common. However, the monolithic nature of FPGAs means that all sub-circuits share common on-chip infrastructure, such as routing resources. This presents an attack vector for all FPGAs that contain designs from multiple vendors, especially for FPGAs used in multi-tenant cloud environments, or integrated into multi-core processors. In this article, we show that “long” routing wires present a new source of information leakage on FPGAs, by influencing the delay of adjacent long wires. We show that the effect is measurable for both static and dynamic signals and that it can be detected using small on-board circuits. We characterize the channel in detail and show that it is measurable even when multiple competing circuits (including multiple long-wire transmitters) are present and can be replicated on different generations and families of Xilinx devices (Virtex 5, Virtex 6, Artix 7, and Spartan 7). We exploit the leakage to create a covert channel with 6kbps of bandwidth and 99.9% accuracy, and a side channel, which can recover signals kept constant for only 1.3sμs, with an accuracy of more than 98.4%. Finally, we propose countermeasures to reduce the impact of this leakage.1","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128416632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent Attacks and Defenses on FPGA-based Systems","authors":"Jiliang Zhang, G. Qu","doi":"10.1145/3340557","DOIUrl":"https://doi.org/10.1145/3340557","url":null,"abstract":"Field-programmable gate array (FPGA) is a kind of programmable chip that is widely used in many areas, including automotive electronics, medical devices, military and consumer electronics, and is gaining more popularity. Unlike the application specific integrated circuits (ASIC) design, an FPGA-based system has its own supply-chain model and design flow, which brings interesting security and trust challenges. In this survey, we review the security and trust issues related to FPGA-based systems from the market perspective, where we model the market with the following parties: FPGA vendors, foundries, IP vendors, EDA tool vendors, FPGA-based system developers, and end-users. For each party, we show the security and trust problems they need to be aware of and the associated solutions that are available. We also discuss some challenges and opportunities in the security and trust of FPGA-based systems used in large-scale cloud and datacenters.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133265749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abeer Y. Al-Hyari, Ziad Abuowaimer, T. Martin, G. Grewal, S. Areibi, A. Vannelli
{"title":"Novel Congestion-estimation and Routability-prediction Methods based on Machine Learning for Modern FPGAs","authors":"Abeer Y. Al-Hyari, Ziad Abuowaimer, T. Martin, G. Grewal, S. Areibi, A. Vannelli","doi":"10.1145/3337930","DOIUrl":"https://doi.org/10.1145/3337930","url":null,"abstract":"Effectively estimating and managing congestion during placement can save substantial placement and routing runtime. In this article, we present a machine-learning model for accurately and efficiently estimating congestion during FPGA placement. Compared with the state-of-the-art machine-learning congestion-estimation model, our results show a 25% improvement in prediction accuracy. This makes our model competitive with congestion estimates produced using a global router. However, our model runs, on average, 291× faster than the global router. Overall, we are able to reduce placement runtimes by 17% and router runtimes by 19%. An additional machine-learning model is also presented that uses the output of the first congestion-estimation model to determine whether or not a placement is routable. This second model has an accuracy in the range of 93% to 98%, depending on the classification algorithm used to implement the learning model, and runtimes of a few milliseconds, thus making it suitable for inclusion in any placer with no worry of additional computational overhead.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130682919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Protection and Pay-per-use Licensing Scheme for On-cloud FPGA Circuit IPs","authors":"M. Elrabaa, Mohamed A. Al-Asli, M. Abu-Amara","doi":"10.1145/3329861","DOIUrl":"https://doi.org/10.1145/3329861","url":null,"abstract":"Using security primitives, a novel scheme for licensing hardware intellectual properties (HWIPs) on Field Programmable Gate Arrays (FPGAs) in public clouds is proposed. The proposed scheme enforces a pay-per-use model, allows HWIP's installation only on specific on-cloud FPGAs, and efficiently protects the HWIPs from being cloned, reverse engineered, or used without the owner's authorization by any party, including a cloud insider. It also provides protection for the users’ designs integrated with the HWIP on the same FPGA. This enables cloud tenants to license HWIPs in the cloud from the HWIP vendors at a relatively low price based on usage instead of paying the expensive unlimited HWIP license fee. The scheme includes a protocol for FPGA authentication, HWIP secure decryption, and usage by the clients without the need for the HWIP vendor to be involved or divulge their secret keys. A complete prototype test-bed implementation showed that the proposed scheme is very feasible with relatively low resource utilization. Experiments also showed that a HWIP could be licensed and set up in the on-cloud FPGA in 0.9s. This is 15 times faster than setting up the same HWIP from outside the cloud, which takes about 14s based on the average global Internet speed.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127312908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mitigating Electrical-level Attacks towards Secure Multi-Tenant FPGAs in the Cloud","authors":"Jonas Krautter, Dennis R. E. Gnad, M. Tahoori","doi":"10.1145/3328222","DOIUrl":"https://doi.org/10.1145/3328222","url":null,"abstract":"A rising trend is the use of multi-tenant FPGAs, particularly in cloud environments, where partial access to the hardware is given to multiple third parties. This leads to new types of attacks in FPGAs, which operate not only on the logic level, but also on the electrical level through the common power delivery network. Since FPGAs are configured from the software-side, attackers are enabled to launch hardware attacks from software, impacting the security of an entire system. In this article, we show the first attempt of a countermeasure against attacks on the electrical level, which is based on a bitstream checking methodology. Bitstreams are translated back into flat technology mapped netlists, which are then checked for properties that indicate potential malicious runtime behavior of FPGA logic. Our approach can provide a metric of potential risk of the FPGA bitstream being used in active fault or passive side-channel attacks against other users of the FPGA fabric or the entire SoC platform.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117137560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel FPGA Implementation of a Time-to-Digital Converter Supporting Run-Time Estimation and Compensation","authors":"Van Luan Dinh, X. Nguyen, Hyuk-Jae Lee","doi":"10.1145/3322482","DOIUrl":"https://doi.org/10.1145/3322482","url":null,"abstract":"Time-to-digital converters (TDCs) are widely used in applications that require the measurement of the time interval between events. In previous designs using a feedback loop and an extended delay line, process-voltage-temperature (PVT) variation often decreases the accuracy of measurements. To overcome the loss of accuracy caused by PVT variation, this study proposes a novel design of a synthesizable TDC that employs run-time estimation and compensation of PVT variation. A delay line consisting of a series of buffers is used to detect the period of a ring oscillator designed to measure the time interval between two events. By comparing the detected period and the system clock, the variation of the oscillation period is compensated at run-time. The proposed TDC is successfully implemented by using a low-cost Xilinx Spartan-6 LX9 FPGA with a 50-MHz oscillator. Experimental results show that the proposed TDC is robust to PVT variation with a resolution of 19.1 ps. In comparison with previous design, the proposed TDC achieves about five times better tradeoff in the area, resolution, and frequency of the reference clock.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121695508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}