Zimeng Zhou, Xuan Sun, Jinghuan Yu, Sarana Nutanong, C. Xue
{"title":"分布式数据库系统的近数据过滤","authors":"Zimeng Zhou, Xuan Sun, Jinghuan Yu, Sarana Nutanong, C. Xue","doi":"10.1109/IGCC.2018.8752112","DOIUrl":null,"url":null,"abstract":"Over the past decade, data movement costs dominate the execution time of data-intensive applications for distributed systems and they are expected to be even more important in the future. Near data processing is a straightforward solution to reduce data movement which brings compute resources closer to the data source. This paper explores near data processing in a generic distributed system to improve the performance by reducing data movement. An efficient near data filtering solution is designed and implemented by introducing a filter layer which performs tuple-level near data filtering. In order to reduce idle time of processing nodes and improve data transmission throughput the proposed solution is extended to support block-level near data filtering by creating index for each data block. Furthermore, to answer the question when and how to perform near data filtering this paper proposes an adaptive near data filtering solution to balance the computation and data transmission throughput. Experimental results show that the proposed solutions are superior to the best existing method for most cases. The adaptive near data filtering solution achieves an average speedup factor of 4:59 for queries with low selectivity.","PeriodicalId":388554,"journal":{"name":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Near Data Filtering for Distributed Database Systems\",\"authors\":\"Zimeng Zhou, Xuan Sun, Jinghuan Yu, Sarana Nutanong, C. Xue\",\"doi\":\"10.1109/IGCC.2018.8752112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past decade, data movement costs dominate the execution time of data-intensive applications for distributed systems and they are expected to be even more important in the future. Near data processing is a straightforward solution to reduce data movement which brings compute resources closer to the data source. This paper explores near data processing in a generic distributed system to improve the performance by reducing data movement. An efficient near data filtering solution is designed and implemented by introducing a filter layer which performs tuple-level near data filtering. In order to reduce idle time of processing nodes and improve data transmission throughput the proposed solution is extended to support block-level near data filtering by creating index for each data block. Furthermore, to answer the question when and how to perform near data filtering this paper proposes an adaptive near data filtering solution to balance the computation and data transmission throughput. Experimental results show that the proposed solutions are superior to the best existing method for most cases. The adaptive near data filtering solution achieves an average speedup factor of 4:59 for queries with low selectivity.\",\"PeriodicalId\":388554,\"journal\":{\"name\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IGCC.2018.8752112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGCC.2018.8752112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Near Data Filtering for Distributed Database Systems
Over the past decade, data movement costs dominate the execution time of data-intensive applications for distributed systems and they are expected to be even more important in the future. Near data processing is a straightforward solution to reduce data movement which brings compute resources closer to the data source. This paper explores near data processing in a generic distributed system to improve the performance by reducing data movement. An efficient near data filtering solution is designed and implemented by introducing a filter layer which performs tuple-level near data filtering. In order to reduce idle time of processing nodes and improve data transmission throughput the proposed solution is extended to support block-level near data filtering by creating index for each data block. Furthermore, to answer the question when and how to perform near data filtering this paper proposes an adaptive near data filtering solution to balance the computation and data transmission throughput. Experimental results show that the proposed solutions are superior to the best existing method for most cases. The adaptive near data filtering solution achieves an average speedup factor of 4:59 for queries with low selectivity.