Ken Eguro, S. Neuendorffer, V. Prasanna, Hongbo Rong
{"title":"Introduction to Special Issue on FPGAs in Data Centers","authors":"Ken Eguro, S. Neuendorffer, V. Prasanna, Hongbo Rong","doi":"10.1145/3493607","DOIUrl":null,"url":null,"abstract":"Hardware accelerators have been used recently to augment the compute power of data centers to improve the performance of many applications, particularly to optimize latency sensitive applications. In fact, several commercial vendors offer FPGAs in their cloud platforms. This special issue of ACM Transactions on Reconfigurable Technology and Systems presents advanced research in using FPGAs in data centers. The articles present recent research in several topics including impact of terrestrial radiation; memory system optimization using FPGAs; use and management of network accessible FPGAs; virtualization and runtime resource management in using FPGAs; novel applications of FPGAs in data centers; FPGA IP cores for data center acceleration; latency, and performance tradeoffs in using FPGAs for acceleration; and communication optimization using FPGAs. In response to the call for papers, 21 papers were received. After a thorough review of these manuscripts following the ACM manuscript review guidelines, 13 papers were accepted. The papers are grouped into two issues. This issue includes 10 papers accepted for publication in this special issue. The article “Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning” by Petrica et al. presents an automatic partitioning technique to maximize the performance and scalability of FPGA-based pipeline dataflow DNN inference accelerators on computing infrastructures consisting of multi-die, network-connected FPGAs. The article “xDNN: Inference for Deep Convolutional Neural Network” by D’Alberto et al. presents an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on FPGAs and Convolution Neural Networks. The article “Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-enabled Field-programmable Gate Arrays” by Nane et al. studies the potential of using FPGAs in computational flow dynamics in the context of rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of high-bandwidth memories on board. The article “BurstZ+: Eliminating the Communication Bottleneck of Scientific Computing Accelerators via Accelerated Compression” by Jun et al. presents an accelerator platform that eliminates the communication bottleneck between PCIe-attached scientific computing accelerators and their host servers via hardware-optimized compression. The article “Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-unfriendly Applications on FPGAs” by Asiatici et al. presents an efficient on-chip memory system for applications such as graph analytics to minimize the number of pipeline stalls. The article “NASCENT2: Generic Near-storage Sort Accelerator for Data Analytics on SmartSSD” by Salamat et al. presents an efficient algorithm for sorting of database tables via partitioning the data into multiple smaller sort operations.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3493607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Hardware accelerators have been used recently to augment the compute power of data centers to improve the performance of many applications, particularly to optimize latency sensitive applications. In fact, several commercial vendors offer FPGAs in their cloud platforms. This special issue of ACM Transactions on Reconfigurable Technology and Systems presents advanced research in using FPGAs in data centers. The articles present recent research in several topics including impact of terrestrial radiation; memory system optimization using FPGAs; use and management of network accessible FPGAs; virtualization and runtime resource management in using FPGAs; novel applications of FPGAs in data centers; FPGA IP cores for data center acceleration; latency, and performance tradeoffs in using FPGAs for acceleration; and communication optimization using FPGAs. In response to the call for papers, 21 papers were received. After a thorough review of these manuscripts following the ACM manuscript review guidelines, 13 papers were accepted. The papers are grouped into two issues. This issue includes 10 papers accepted for publication in this special issue. The article “Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning” by Petrica et al. presents an automatic partitioning technique to maximize the performance and scalability of FPGA-based pipeline dataflow DNN inference accelerators on computing infrastructures consisting of multi-die, network-connected FPGAs. The article “xDNN: Inference for Deep Convolutional Neural Network” by D’Alberto et al. presents an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on FPGAs and Convolution Neural Networks. The article “Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-enabled Field-programmable Gate Arrays” by Nane et al. studies the potential of using FPGAs in computational flow dynamics in the context of rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of high-bandwidth memories on board. The article “BurstZ+: Eliminating the Communication Bottleneck of Scientific Computing Accelerators via Accelerated Compression” by Jun et al. presents an accelerator platform that eliminates the communication bottleneck between PCIe-attached scientific computing accelerators and their host servers via hardware-optimized compression. The article “Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-unfriendly Applications on FPGAs” by Asiatici et al. presents an efficient on-chip memory system for applications such as graph analytics to minimize the number of pipeline stalls. The article “NASCENT2: Generic Near-storage Sort Accelerator for Data Analytics on SmartSSD” by Salamat et al. presents an efficient algorithm for sorting of database tables via partitioning the data into multiple smaller sort operations.