{"title":"fpga集群上决策树集成的分布式推理","authors":"Muhsen Owaida, Amit Kulkarni, G. Alonso","doi":"10.1145/3340263","DOIUrl":null,"url":null,"abstract":"Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in machine learning often require a cluster of accelerators to make a relevant difference in performance. In this article, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The article explores several levels of the problem: (1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs, (2) the data partitioning and distribution strategies maximizing performance, (3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs\",\"authors\":\"Muhsen Owaida, Amit Kulkarni, G. Alonso\",\"doi\":\"10.1145/3340263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in machine learning often require a cluster of accelerators to make a relevant difference in performance. In this article, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The article explores several levels of the problem: (1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs, (2) the data partitioning and distribution strategies maximizing performance, (3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.\",\"PeriodicalId\":162787,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems (TRETS)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems (TRETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3340263\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3340263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs
Given the growth in data inputs and application complexity, it is often the case that a single hardware accelerator is not enough to solve a given problem. In particular, the computational demands and I/O of many tasks in machine learning often require a cluster of accelerators to make a relevant difference in performance. In this article, we explore the efficient construction of FPGA clusters using inference over Decision Tree Ensembles as the target application. The article explores several levels of the problem: (1) a lightweight inter-FPGA communication protocol and routing layer to facilitate the communication between the different FPGAs, (2) the data partitioning and distribution strategies maximizing performance, (3) and an in depth analysis on how applications can be efficiently distributed over such a cluster. The experimental analysis shows that the resulting system can support inference over decision tree ensembles at a significantly higher throughput than that achieved by existing systems.