{"title":"ol -ba:使用近似负载统计数据对倾斜数据进行有效的数据负载平衡","authors":"Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun","doi":"10.1109/ISPS.2018.8379005","DOIUrl":null,"url":null,"abstract":"Data skew can significantly deteriorate query performance in distributed systems. More concretely, when the data is range partitioned, usually it may be unequally distributed across the partitions. When tuples are inserted and deleted continuously, some of these data shall be moved from the hot nodes to the least loaded ones in order to satisfy the storage balance requirement. These movements have an important impact in terms of maintaining the load statistics related to each node, such as the partition boundaries and the load size. Efficient solutions from the state-of-art that address the data skew problem require global load statistics with a cost of O(log n) messages. In this paper, we propose an efficient online load-balancing algorithm for the range-partitioned data. Our solution is based on the fuzzy image (FZIM) concept. The basic idea about the FZIM is that both clients and nodes have an approximate knowledge about the effective partition statistics. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages as opposed to the cost of efficient solutions from the state-of-art (which requires at least O(log n) messages).","PeriodicalId":294761,"journal":{"name":"2018 International Symposium on Programming and Systems (ISPS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"OL-BaS: Efficient data load-balancing for skewed data with approximate load statistics\",\"authors\":\"Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun\",\"doi\":\"10.1109/ISPS.2018.8379005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data skew can significantly deteriorate query performance in distributed systems. More concretely, when the data is range partitioned, usually it may be unequally distributed across the partitions. When tuples are inserted and deleted continuously, some of these data shall be moved from the hot nodes to the least loaded ones in order to satisfy the storage balance requirement. These movements have an important impact in terms of maintaining the load statistics related to each node, such as the partition boundaries and the load size. Efficient solutions from the state-of-art that address the data skew problem require global load statistics with a cost of O(log n) messages. In this paper, we propose an efficient online load-balancing algorithm for the range-partitioned data. Our solution is based on the fuzzy image (FZIM) concept. The basic idea about the FZIM is that both clients and nodes have an approximate knowledge about the effective partition statistics. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages as opposed to the cost of efficient solutions from the state-of-art (which requires at least O(log n) messages).\",\"PeriodicalId\":294761,\"journal\":{\"name\":\"2018 International Symposium on Programming and Systems (ISPS)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Symposium on Programming and Systems (ISPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPS.2018.8379005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Symposium on Programming and Systems (ISPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPS.2018.8379005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OL-BaS: Efficient data load-balancing for skewed data with approximate load statistics
Data skew can significantly deteriorate query performance in distributed systems. More concretely, when the data is range partitioned, usually it may be unequally distributed across the partitions. When tuples are inserted and deleted continuously, some of these data shall be moved from the hot nodes to the least loaded ones in order to satisfy the storage balance requirement. These movements have an important impact in terms of maintaining the load statistics related to each node, such as the partition boundaries and the load size. Efficient solutions from the state-of-art that address the data skew problem require global load statistics with a cost of O(log n) messages. In this paper, we propose an efficient online load-balancing algorithm for the range-partitioned data. Our solution is based on the fuzzy image (FZIM) concept. The basic idea about the FZIM is that both clients and nodes have an approximate knowledge about the effective partition statistics. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages as opposed to the cost of efficient solutions from the state-of-art (which requires at least O(log n) messages).