Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich
{"title":"ReOrder:用于高吞吐量多流处理的运行时数据路径生成","authors":"Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich","doi":"10.1109/ReConFig.2016.7857185","DOIUrl":null,"url":null,"abstract":"Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ReOrder: Runtime datapath generation for high-throughput multi-stream processing\",\"authors\":\"Andreas Becher, S. Wildermann, Moritz Mühlenthaler, J. Teich\",\"doi\":\"10.1109/ReConFig.2016.7857185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.\",\"PeriodicalId\":431909,\"journal\":{\"name\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReConFig.2016.7857185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2016.7857185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ReOrder: Runtime datapath generation for high-throughput multi-stream processing
Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.