{"title":"最优化:具有不完全划分向量的范围划分数据的在线平衡","authors":"Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun","doi":"10.6025/JCL/2018/9/4/135-147","DOIUrl":null,"url":null,"abstract":"Range query has a crucial role in large-scale data analysis. Unfortunately, the performance may be severely degraded by data skew. Such problem is often faced in large-scale parallel databases, peer-to-peer (P2P) systems as well as in Cloud computing. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by reducing the cost of global skew knowledge and broadcasting. Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that guarantees a good ratio between the maximum and minimum loads among nodes. However, their algorithm requires global max-min load information to use local load balancing operations. Global load information can be found with O (log n) messages. In order to reduce this cost, we propose OPTIMA, a novel online load balancing approach for range partitioned data. Whenever a partition becomes overloaded, data transfers are performed, in background, from the most loaded nodes to the least loaded ones as in Ganesan et al., work. As a result, the partition boundaries and data sizes change. The key point of our proposal is the imperfect knowledge of the global load information (partition statistics). We introduce the concept of “Imperfect Partitioning Vector” (IPV), where, both nodes and clients have an approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages or maintaining a data structure as opposed to the cost of efficient solutions from the state-of-art (which require at least O (log n) messages).","PeriodicalId":198771,"journal":{"name":"International Journal of Computational Linguistics Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OPTIMA: On-Line Balancing of Range-Partitioned Data with Imperfect Partitioning Vector\",\"authors\":\"Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun\",\"doi\":\"10.6025/JCL/2018/9/4/135-147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Range query has a crucial role in large-scale data analysis. Unfortunately, the performance may be severely degraded by data skew. Such problem is often faced in large-scale parallel databases, peer-to-peer (P2P) systems as well as in Cloud computing. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by reducing the cost of global skew knowledge and broadcasting. Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that guarantees a good ratio between the maximum and minimum loads among nodes. However, their algorithm requires global max-min load information to use local load balancing operations. Global load information can be found with O (log n) messages. In order to reduce this cost, we propose OPTIMA, a novel online load balancing approach for range partitioned data. Whenever a partition becomes overloaded, data transfers are performed, in background, from the most loaded nodes to the least loaded ones as in Ganesan et al., work. As a result, the partition boundaries and data sizes change. The key point of our proposal is the imperfect knowledge of the global load information (partition statistics). We introduce the concept of “Imperfect Partitioning Vector” (IPV), where, both nodes and clients have an approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages or maintaining a data structure as opposed to the cost of efficient solutions from the state-of-art (which require at least O (log n) messages).\",\"PeriodicalId\":198771,\"journal\":{\"name\":\"International Journal of Computational Linguistics Research\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computational Linguistics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6025/JCL/2018/9/4/135-147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Linguistics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6025/JCL/2018/9/4/135-147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OPTIMA: On-Line Balancing of Range-Partitioned Data with Imperfect Partitioning Vector
Range query has a crucial role in large-scale data analysis. Unfortunately, the performance may be severely degraded by data skew. Such problem is often faced in large-scale parallel databases, peer-to-peer (P2P) systems as well as in Cloud computing. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by reducing the cost of global skew knowledge and broadcasting. Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that guarantees a good ratio between the maximum and minimum loads among nodes. However, their algorithm requires global max-min load information to use local load balancing operations. Global load information can be found with O (log n) messages. In order to reduce this cost, we propose OPTIMA, a novel online load balancing approach for range partitioned data. Whenever a partition becomes overloaded, data transfers are performed, in background, from the most loaded nodes to the least loaded ones as in Ganesan et al., work. As a result, the partition boundaries and data sizes change. The key point of our proposal is the imperfect knowledge of the global load information (partition statistics). We introduce the concept of “Imperfect Partitioning Vector” (IPV), where, both nodes and clients have an approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages or maintaining a data structure as opposed to the cost of efficient solutions from the state-of-art (which require at least O (log n) messages).