Kaspar Mätas, Kristiyan Manev, Joseph Powell, Dirk Koch
{"title":"fpga上流处理管道的自动生成和编排","authors":"Kaspar Mätas, Kristiyan Manev, Joseph Powell, Dirk Koch","doi":"10.1109/ICFPT56656.2022.9974596","DOIUrl":null,"url":null,"abstract":"FPGAs have demonstrated substantial performance and energy efficiency advantages for workloads that fit a stream processing model with direct module-to-module communication. However, when the dataflow processing system is required to adapt to runtime conditions, current static acceleration solutions are limited. To better use FPGAs in dynamic scenarios, this paper proposes using partial reconfiguration to stitch together different physically implemented operator modules on-the-fly. Rather than using designated module slots, our system places all modules and routing wires into a shared region with more placement options to minimize fragmentation. Furthermore, we use a module library that provides different resource and performance trade-offs for faster execution while considering the configuration cost. Our system finds the optimal set of modules while scheduling multiple acceleration requests and managing all constraints transparently to the end-user. We demonstrate that the middleware is fast enough to compose accelerator pipelines at runtime with end-to- end execution times equal to hand-crafted static systems when processing small datasets. For large datasets, we found up to 7.2 x faster execution over static systems when using our runtime methods. We exemplified our approach for database acceleration, where the whole dynamic FPGA acceleration is inferred by directly executing SQL queries.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated Generation and Orchestration of Stream Processing Pipelines on FPGAs\",\"authors\":\"Kaspar Mätas, Kristiyan Manev, Joseph Powell, Dirk Koch\",\"doi\":\"10.1109/ICFPT56656.2022.9974596\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"FPGAs have demonstrated substantial performance and energy efficiency advantages for workloads that fit a stream processing model with direct module-to-module communication. However, when the dataflow processing system is required to adapt to runtime conditions, current static acceleration solutions are limited. To better use FPGAs in dynamic scenarios, this paper proposes using partial reconfiguration to stitch together different physically implemented operator modules on-the-fly. Rather than using designated module slots, our system places all modules and routing wires into a shared region with more placement options to minimize fragmentation. Furthermore, we use a module library that provides different resource and performance trade-offs for faster execution while considering the configuration cost. Our system finds the optimal set of modules while scheduling multiple acceleration requests and managing all constraints transparently to the end-user. We demonstrate that the middleware is fast enough to compose accelerator pipelines at runtime with end-to- end execution times equal to hand-crafted static systems when processing small datasets. For large datasets, we found up to 7.2 x faster execution over static systems when using our runtime methods. We exemplified our approach for database acceleration, where the whole dynamic FPGA acceleration is inferred by directly executing SQL queries.\",\"PeriodicalId\":239314,\"journal\":{\"name\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT56656.2022.9974596\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated Generation and Orchestration of Stream Processing Pipelines on FPGAs
FPGAs have demonstrated substantial performance and energy efficiency advantages for workloads that fit a stream processing model with direct module-to-module communication. However, when the dataflow processing system is required to adapt to runtime conditions, current static acceleration solutions are limited. To better use FPGAs in dynamic scenarios, this paper proposes using partial reconfiguration to stitch together different physically implemented operator modules on-the-fly. Rather than using designated module slots, our system places all modules and routing wires into a shared region with more placement options to minimize fragmentation. Furthermore, we use a module library that provides different resource and performance trade-offs for faster execution while considering the configuration cost. Our system finds the optimal set of modules while scheduling multiple acceleration requests and managing all constraints transparently to the end-user. We demonstrate that the middleware is fast enough to compose accelerator pipelines at runtime with end-to- end execution times equal to hand-crafted static systems when processing small datasets. For large datasets, we found up to 7.2 x faster execution over static systems when using our runtime methods. We exemplified our approach for database acceleration, where the whole dynamic FPGA acceleration is inferred by directly executing SQL queries.