最优化:具有不完全划分向量的范围划分数据的在线平衡

Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun
{"title":"最优化:具有不完全划分向量的范围划分数据的在线平衡","authors":"Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun","doi":"10.6025/JCL/2018/9/4/135-147","DOIUrl":null,"url":null,"abstract":"Range query has a crucial role in large-scale data analysis. Unfortunately, the performance may be severely degraded by data skew. Such problem is often faced in large-scale parallel databases, peer-to-peer (P2P) systems as well as in Cloud computing. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by reducing the cost of global skew knowledge and broadcasting. Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that guarantees a good ratio between the maximum and minimum loads among nodes. However, their algorithm requires global max-min load information to use local load balancing operations. Global load information can be found with O (log n) messages. In order to reduce this cost, we propose OPTIMA, a novel online load balancing approach for range partitioned data. Whenever a partition becomes overloaded, data transfers are performed, in background, from the most loaded nodes to the least loaded ones as in Ganesan et al., work. As a result, the partition boundaries and data sizes change. The key point of our proposal is the imperfect knowledge of the global load information (partition statistics). We introduce the concept of “Imperfect Partitioning Vector” (IPV), where, both nodes and clients have an approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages or maintaining a data structure as opposed to the cost of efficient solutions from the state-of-art (which require at least O (log n) messages).","PeriodicalId":198771,"journal":{"name":"International Journal of Computational Linguistics Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OPTIMA: On-Line Balancing of Range-Partitioned Data with Imperfect Partitioning Vector\",\"authors\":\"Djahida Belayadi, Khaled-Walid Hidouci, Khadidja Midoun\",\"doi\":\"10.6025/JCL/2018/9/4/135-147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Range query has a crucial role in large-scale data analysis. Unfortunately, the performance may be severely degraded by data skew. Such problem is often faced in large-scale parallel databases, peer-to-peer (P2P) systems as well as in Cloud computing. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by reducing the cost of global skew knowledge and broadcasting. Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that guarantees a good ratio between the maximum and minimum loads among nodes. However, their algorithm requires global max-min load information to use local load balancing operations. Global load information can be found with O (log n) messages. In order to reduce this cost, we propose OPTIMA, a novel online load balancing approach for range partitioned data. Whenever a partition becomes overloaded, data transfers are performed, in background, from the most loaded nodes to the least loaded ones as in Ganesan et al., work. As a result, the partition boundaries and data sizes change. The key point of our proposal is the imperfect knowledge of the global load information (partition statistics). We introduce the concept of “Imperfect Partitioning Vector” (IPV), where, both nodes and clients have an approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages or maintaining a data structure as opposed to the cost of efficient solutions from the state-of-art (which require at least O (log n) messages).\",\"PeriodicalId\":198771,\"journal\":{\"name\":\"International Journal of Computational Linguistics Research\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computational Linguistics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6025/JCL/2018/9/4/135-147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Linguistics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6025/JCL/2018/9/4/135-147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

范围查询在大规模数据分析中起着至关重要的作用。不幸的是,数据倾斜可能会严重降低性能。在大规模并行数据库、点对点(P2P)系统以及云计算中经常面临这样的问题。设计用于处理此问题的最先进的方法比原始的实现提供了显著的改进。然而,可以通过降低全球倾斜知识和广播的成本来进一步提高性能。Ganesan, Bawa和Garcia-Molina提出了一种负载平衡算法,保证节点间最大和最小负载之间的良好比例。然而,他们的算法需要全局最大最小负载信息来使用本地负载平衡操作。可以使用O (log n)条消息找到全局负载信息。为了降低这一成本,我们提出了一种新的范围分区数据在线负载平衡方法OPTIMA。每当分区超载时,数据传输就会在后台执行,从负载最多的节点传输到负载最少的节点,就像Ganesan等人的工作一样。因此,分区边界和数据大小会发生变化。我们的建议的关键点是不完全了解全局负载信息(分区统计)。我们引入了“不完全分区向量”(IPV)的概念,其中节点和客户端都有关于负载分布的近似信息。然而,它们可以以与使用精确分区统计数据几乎相同的效率定位任何数据。此外,维护负载分布统计信息不需要交换额外的消息或维护数据结构,而不需要使用现有的高效解决方案(至少需要O (log n)条消息)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
OPTIMA: On-Line Balancing of Range-Partitioned Data with Imperfect Partitioning Vector
Range query has a crucial role in large-scale data analysis. Unfortunately, the performance may be severely degraded by data skew. Such problem is often faced in large-scale parallel databases, peer-to-peer (P2P) systems as well as in Cloud computing. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by reducing the cost of global skew knowledge and broadcasting. Ganesan, Bawa and Garcia-Molina proposed a load-balancing algorithm that guarantees a good ratio between the maximum and minimum loads among nodes. However, their algorithm requires global max-min load information to use local load balancing operations. Global load information can be found with O (log n) messages. In order to reduce this cost, we propose OPTIMA, a novel online load balancing approach for range partitioned data. Whenever a partition becomes overloaded, data transfers are performed, in background, from the most loaded nodes to the least loaded ones as in Ganesan et al., work. As a result, the partition boundaries and data sizes change. The key point of our proposal is the imperfect knowledge of the global load information (partition statistics). We introduce the concept of “Imperfect Partitioning Vector” (IPV), where, both nodes and clients have an approximate information about the load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages or maintaining a data structure as opposed to the cost of efficient solutions from the state-of-art (which require at least O (log n) messages).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信