Brad Glasbergen, Michael Abebe, Khuzaima S. Daudjee
{"title":"Tutorial: Adaptive Replication and Partitioning in Data Systems","authors":"Brad Glasbergen, Michael Abebe, Khuzaima S. Daudjee","doi":"10.1145/3279945.3279946","DOIUrl":null,"url":null,"abstract":"To meet growing application demands, distributed data systems replicate and partition data across multiple machines. Replication increases the resource and request processing capabilities of a system by spreading copies of the data across multiple machines, while partitioning splits data across machines to achieve the same objectives. Replication and partitioning present different trade-offs in the form of replication maintenance and multi-machine coordination costs, which system administrators must carefully evaluate. Traditionally, administrators made replication and partitioning decisions based on their understanding of the application workload, which results in suboptimal performance if the system is misconfigured or if the workload changes. However, systems that adaptively employ replication and partitioning can adjust these decisions based on workload observations and predictions, which improves performance and reduces complexity for administrators. In this tutorial, we present an overview of techniques used by systems to adaptively partition and replicate data and services. We focus on the decision-making strategies employed by these systems, and how these decisions are executed in an online environment. Finally, we identify opportunities for research in the area.","PeriodicalId":262822,"journal":{"name":"International Middleware Conference","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Middleware Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3279945.3279946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
To meet growing application demands, distributed data systems replicate and partition data across multiple machines. Replication increases the resource and request processing capabilities of a system by spreading copies of the data across multiple machines, while partitioning splits data across machines to achieve the same objectives. Replication and partitioning present different trade-offs in the form of replication maintenance and multi-machine coordination costs, which system administrators must carefully evaluate. Traditionally, administrators made replication and partitioning decisions based on their understanding of the application workload, which results in suboptimal performance if the system is misconfigured or if the workload changes. However, systems that adaptively employ replication and partitioning can adjust these decisions based on workload observations and predictions, which improves performance and reduces complexity for administrators. In this tutorial, we present an overview of techniques used by systems to adaptively partition and replicate data and services. We focus on the decision-making strategies employed by these systems, and how these decisions are executed in an online environment. Finally, we identify opportunities for research in the area.