分布式数据库系统的近数据过滤

2018 Ninth International Green and Sustainable Computing Conference (IGSC) Pub Date : 2018-10-01 DOI:10.1109/IGCC.2018.8752112

Zimeng Zhou, Xuan Sun, Jinghuan Yu, Sarana Nutanong, C. Xue

{"title":"分布式数据库系统的近数据过滤","authors":"Zimeng Zhou, Xuan Sun, Jinghuan Yu, Sarana Nutanong, C. Xue","doi":"10.1109/IGCC.2018.8752112","DOIUrl":null,"url":null,"abstract":"Over the past decade, data movement costs dominate the execution time of data-intensive applications for distributed systems and they are expected to be even more important in the future. Near data processing is a straightforward solution to reduce data movement which brings compute resources closer to the data source. This paper explores near data processing in a generic distributed system to improve the performance by reducing data movement. An efficient near data filtering solution is designed and implemented by introducing a filter layer which performs tuple-level near data filtering. In order to reduce idle time of processing nodes and improve data transmission throughput the proposed solution is extended to support block-level near data filtering by creating index for each data block. Furthermore, to answer the question when and how to perform near data filtering this paper proposes an adaptive near data filtering solution to balance the computation and data transmission throughput. Experimental results show that the proposed solutions are superior to the best existing method for most cases. The adaptive near data filtering solution achieves an average speedup factor of 4:59 for queries with low selectivity.","PeriodicalId":388554,"journal":{"name":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Near Data Filtering for Distributed Database Systems\",\"authors\":\"Zimeng Zhou, Xuan Sun, Jinghuan Yu, Sarana Nutanong, C. Xue\",\"doi\":\"10.1109/IGCC.2018.8752112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past decade, data movement costs dominate the execution time of data-intensive applications for distributed systems and they are expected to be even more important in the future. Near data processing is a straightforward solution to reduce data movement which brings compute resources closer to the data source. This paper explores near data processing in a generic distributed system to improve the performance by reducing data movement. An efficient near data filtering solution is designed and implemented by introducing a filter layer which performs tuple-level near data filtering. In order to reduce idle time of processing nodes and improve data transmission throughput the proposed solution is extended to support block-level near data filtering by creating index for each data block. Furthermore, to answer the question when and how to perform near data filtering this paper proposes an adaptive near data filtering solution to balance the computation and data transmission throughput. Experimental results show that the proposed solutions are superior to the best existing method for most cases. The adaptive near data filtering solution achieves an average speedup factor of 4:59 for queries with low selectivity.\",\"PeriodicalId\":388554,\"journal\":{\"name\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IGCC.2018.8752112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGCC.2018.8752112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的十年中，数据移动成本在分布式系统的数据密集型应用程序的执行时间中占主导地位，并且预计在未来会更加重要。近数据处理是一种减少数据移动的直接解决方案，它使计算资源更接近数据源。本文探讨了通用分布式系统中的近数据处理，通过减少数据移动来提高系统性能。设计并实现了一种高效的近数据过滤方案，该方案通过引入一个执行双级近数据过滤的过滤层来实现。为了减少处理节点的空闲时间，提高数据传输吞吐量，扩展了该方案，通过为每个数据块创建索引来支持块级近数据过滤。此外，针对何时以及如何进行近数据滤波的问题，本文提出了一种自适应近数据滤波解决方案，以平衡计算量和数据传输吞吐量。实验结果表明，在大多数情况下，本文提出的方法优于现有的最佳方法。对于低选择性的查询，自适应近数据过滤解决方案的平均加速系数为4:59。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Near Data Filtering for Distributed Database Systems

Over the past decade, data movement costs dominate the execution time of data-intensive applications for distributed systems and they are expected to be even more important in the future. Near data processing is a straightforward solution to reduce data movement which brings compute resources closer to the data source. This paper explores near data processing in a generic distributed system to improve the performance by reducing data movement. An efficient near data filtering solution is designed and implemented by introducing a filter layer which performs tuple-level near data filtering. In order to reduce idle time of processing nodes and improve data transmission throughput the proposed solution is extended to support block-level near data filtering by creating index for each data block. Furthermore, to answer the question when and how to perform near data filtering this paper proposes an adaptive near data filtering solution to balance the computation and data transmission throughput. Experimental results show that the proposed solutions are superior to the best existing method for most cases. The adaptive near data filtering solution achieves an average speedup factor of 4:59 for queries with low selectivity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Ninth International Green and Sustainable Computing Conference (IGSC)

自引率

0.00%

发文量