Dulana Rupanetti, Hassan A. Salamy, Cheol-Hong Min, Kundan Nepal
{"title":"Re-configurable, expandable, and cost-effective heterogeneous FPGA cluster approach for resource-constrained data analysis","authors":"Dulana Rupanetti, Hassan A. Salamy, Cheol-Hong Min, Kundan Nepal","doi":"10.1080/17445760.2022.2085703","DOIUrl":null,"url":null,"abstract":"Field programmable gate arrays (FPGAs) have become widely prevalent in recent years as a great alternative to application-specific integrated circuits (ASIC) and as a potentially cheap alternative to expensive graphics processing units (GPUs). Introduced as a prototyping solution for ASIC, FPGAs are now widely popular in applications such as artificial intelligence (AI) and machine learning (ML) models that require processing data rapidly. As a relatively low-cost option to GPUs, FPGAs have the advantage of being reprogrammed to be used in almost any data-driven application. In this work, we propose an easily scalable and cost-effective cluster-based co-processing system using FPGAs for ML and AI applications that is easily reconfigured to the requirements of each user application. The aim is to introduce a clustering system of FPGA boards to improve the efficiency of the training component of machine learning algorithms. Our proposed configuration provides an opportunity to utilise relatively inexpensive FPGA development boards to produce a cluster without expert knowledge in VHDL, Verilog, or the system designs related to FPGA development. Consisting of two parts – a computer-based host application to control the cluster and an FPGA cluster connected through a high-speed Ethernet switch, allows the users to customise and adapt the system without much effort. The methods proposed in this paper provide the ability to utilise any FPGA board with an Ethernet port to be used as a part of the cluster and unboundedly scaled. To demonstrate the effectiveness of the proposed work, a two-part experiment to demonstrate the flexibility and portability of the proposed work – a homogeneous and heterogeneous cluster, was conducted with results compared against a desktop computer and combinations of FPGAs in two clusters. Data sets ranging from 60,000 to 14 million, including stroke prediction and covid-19, were used in conducting the experiments. Results suggest that the proposed system in this work performs close to 70% faster than a traditional computer with similar accuracy rates. GRAPHICAL ABSTRACT","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Emergent and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17445760.2022.2085703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Field programmable gate arrays (FPGAs) have become widely prevalent in recent years as a great alternative to application-specific integrated circuits (ASIC) and as a potentially cheap alternative to expensive graphics processing units (GPUs). Introduced as a prototyping solution for ASIC, FPGAs are now widely popular in applications such as artificial intelligence (AI) and machine learning (ML) models that require processing data rapidly. As a relatively low-cost option to GPUs, FPGAs have the advantage of being reprogrammed to be used in almost any data-driven application. In this work, we propose an easily scalable and cost-effective cluster-based co-processing system using FPGAs for ML and AI applications that is easily reconfigured to the requirements of each user application. The aim is to introduce a clustering system of FPGA boards to improve the efficiency of the training component of machine learning algorithms. Our proposed configuration provides an opportunity to utilise relatively inexpensive FPGA development boards to produce a cluster without expert knowledge in VHDL, Verilog, or the system designs related to FPGA development. Consisting of two parts – a computer-based host application to control the cluster and an FPGA cluster connected through a high-speed Ethernet switch, allows the users to customise and adapt the system without much effort. The methods proposed in this paper provide the ability to utilise any FPGA board with an Ethernet port to be used as a part of the cluster and unboundedly scaled. To demonstrate the effectiveness of the proposed work, a two-part experiment to demonstrate the flexibility and portability of the proposed work – a homogeneous and heterogeneous cluster, was conducted with results compared against a desktop computer and combinations of FPGAs in two clusters. Data sets ranging from 60,000 to 14 million, including stroke prediction and covid-19, were used in conducting the experiments. Results suggest that the proposed system in this work performs close to 70% faster than a traditional computer with similar accuracy rates. GRAPHICAL ABSTRACT