fpga原始滤波器的自动调谐

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00036

Tobias Hahn, S. Wildermann, Jürgen Teich

{"title":"fpga原始滤波器的自动调谐","authors":"Tobias Hahn, S. Wildermann, Jürgen Teich","doi":"10.1109/FPL57034.2022.00036","DOIUrl":null,"url":null,"abstract":"Many Big Data applications include the processing of data streams on semi-structured data formats such as JSON. A disadvantage of these formats, however, is that applications may require a significant portion of their processing time to unselectively parse all data. As a remedy, so-called raw filters have been introduced in the past, aiming to reduce the data load before the costly parsing stage. Since filtering unparsed data can also become very costly, raw filters can be designed to filter data approximately, in the sense that they allow false positives to occur, in order to be implemented efficiently. While previously proposed CPU-based solutions are restricted to just string filtering, FPGA approaches have recently been proposed with much more expressive raw filters, allowing also to capture numbers and structural relationships. Yet, as a consequence of the variety of filter possibilities as well as the limited amount of resources available on FPGAs, the selection of optimal filters before their deployment has been identified as a complex problem resulting in the potential need to select less expressive filters in order to consume fewer resources. Many Big Data applications (e.g., stream processing) operate on incoming real-time data over long, potentially unlimited time periods. As a consequence, the conditions for which such a filter is optimized can change over time after its deployment. In this realm, this paper presents a new methodology which automatically adapts the hardware accelerator for raw filtering by means of dynamic hardware reconfiguration. Data is sampled on-the-fly during operation and used by an optimizer-in-the-loop to select and generate a raw filter with optimized selectivity for these data samples. As the optimizer has to take into account the resource costs of the hardware accelerator, we introduce models to estimate the resource costs in order to avoid performing a full synthesis. The filter selection problem can thus be solved within a few minutes with results close to the accurate resource cost estimation. If the selectivity of a query changes over time, such as seasonal differences in the analysis of IoT data, the system can auto-tune its filter to adapt to the situation. Depending on the query and the variability of inherent data changes, significant improvements in the amount of filtered data are presented, resulting in a significant parsing speedup in comparison to a state-of-the-art non-adaptive approach.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Auto-Tuning of Raw Filters for FPGAs\",\"authors\":\"Tobias Hahn, S. Wildermann, Jürgen Teich\",\"doi\":\"10.1109/FPL57034.2022.00036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many Big Data applications include the processing of data streams on semi-structured data formats such as JSON. A disadvantage of these formats, however, is that applications may require a significant portion of their processing time to unselectively parse all data. As a remedy, so-called raw filters have been introduced in the past, aiming to reduce the data load before the costly parsing stage. Since filtering unparsed data can also become very costly, raw filters can be designed to filter data approximately, in the sense that they allow false positives to occur, in order to be implemented efficiently. While previously proposed CPU-based solutions are restricted to just string filtering, FPGA approaches have recently been proposed with much more expressive raw filters, allowing also to capture numbers and structural relationships. Yet, as a consequence of the variety of filter possibilities as well as the limited amount of resources available on FPGAs, the selection of optimal filters before their deployment has been identified as a complex problem resulting in the potential need to select less expressive filters in order to consume fewer resources. Many Big Data applications (e.g., stream processing) operate on incoming real-time data over long, potentially unlimited time periods. As a consequence, the conditions for which such a filter is optimized can change over time after its deployment. In this realm, this paper presents a new methodology which automatically adapts the hardware accelerator for raw filtering by means of dynamic hardware reconfiguration. Data is sampled on-the-fly during operation and used by an optimizer-in-the-loop to select and generate a raw filter with optimized selectivity for these data samples. As the optimizer has to take into account the resource costs of the hardware accelerator, we introduce models to estimate the resource costs in order to avoid performing a full synthesis. The filter selection problem can thus be solved within a few minutes with results close to the accurate resource cost estimation. If the selectivity of a query changes over time, such as seasonal differences in the analysis of IoT data, the system can auto-tune its filter to adapt to the situation. Depending on the query and the variability of inherent data changes, significant improvements in the amount of filtered data are presented, resulting in a significant parsing speedup in comparison to a state-of-the-art non-adaptive approach.\",\"PeriodicalId\":380116,\"journal\":{\"name\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPL57034.2022.00036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

许多大数据应用程序包括对半结构化数据格式(如JSON)的数据流的处理。然而，这些格式的一个缺点是，应用程序可能需要很大一部分处理时间来非选择性地解析所有数据。作为补救措施，过去引入了所谓的原始过滤器，目的是在代价高昂的解析阶段之前减少数据负载。由于过滤未解析的数据也会变得非常昂贵，因此可以将原始过滤器设计为近似过滤数据，即允许出现误报，以便有效地实现。虽然以前提出的基于cpu的解决方案仅限于字符串过滤，但最近提出的FPGA方法具有更具表现力的原始过滤器，也允许捕获数字和结构关系。然而，由于各种各样的滤波器可能性以及fpga上可用的资源有限，在部署之前选择最佳滤波器已被确定为一个复杂的问题，导致可能需要选择表现力较差的滤波器以消耗更少的资源。许多大数据应用(例如，流处理)在很长时间内对传入的实时数据进行操作，可能是无限的。因此，在部署后，这种过滤器优化的条件可能会随着时间的推移而改变。在这一领域，本文提出了一种新的方法，通过动态硬件重构，使硬件加速器自动适应原始滤波。在操作过程中对数据进行实时采样，并由循环中的优化器用于选择和生成具有优化选择性的原始过滤器。由于优化器必须考虑硬件加速器的资源成本，我们引入模型来估计资源成本，以避免执行完整的综合。因此，过滤器选择问题可以在几分钟内解决，结果接近准确的资源成本估计。如果查询的选择性随时间而变化，例如物联网数据分析中的季节性差异，系统可以自动调整其过滤器以适应情况。根据查询和固有数据变化的可变性，在过滤数据的数量上有了显著的改进，与最先进的非自适应方法相比，产生了显著的解析加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Auto-Tuning of Raw Filters for FPGAs

Many Big Data applications include the processing of data streams on semi-structured data formats such as JSON. A disadvantage of these formats, however, is that applications may require a significant portion of their processing time to unselectively parse all data. As a remedy, so-called raw filters have been introduced in the past, aiming to reduce the data load before the costly parsing stage. Since filtering unparsed data can also become very costly, raw filters can be designed to filter data approximately, in the sense that they allow false positives to occur, in order to be implemented efficiently. While previously proposed CPU-based solutions are restricted to just string filtering, FPGA approaches have recently been proposed with much more expressive raw filters, allowing also to capture numbers and structural relationships. Yet, as a consequence of the variety of filter possibilities as well as the limited amount of resources available on FPGAs, the selection of optimal filters before their deployment has been identified as a complex problem resulting in the potential need to select less expressive filters in order to consume fewer resources. Many Big Data applications (e.g., stream processing) operate on incoming real-time data over long, potentially unlimited time periods. As a consequence, the conditions for which such a filter is optimized can change over time after its deployment. In this realm, this paper presents a new methodology which automatically adapts the hardware accelerator for raw filtering by means of dynamic hardware reconfiguration. Data is sampled on-the-fly during operation and used by an optimizer-in-the-loop to select and generate a raw filter with optimized selectivity for these data samples. As the optimizer has to take into account the resource costs of the hardware accelerator, we introduce models to estimate the resource costs in order to avoid performing a full synthesis. The filter selection problem can thus be solved within a few minutes with results close to the accurate resource cost estimation. If the selectivity of a query changes over time, such as seasonal differences in the analysis of IoT data, the system can auto-tune its filter to adapt to the situation. Depending on the query and the variability of inherent data changes, significant improvements in the amount of filtered data are presented, resulting in a significant parsing speedup in comparison to a state-of-the-art non-adaptive approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量