Krishna Kantikiran Pasupuleti, B. Klots, Vijayakrishnan Nagarajan, Ananthakiran Kandukuri, N. Agarwal
{"title":"High Availability Framework and Query Fault Tolerance for Hybrid Distributed Database Systems","authors":"Krishna Kantikiran Pasupuleti, B. Klots, Vijayakrishnan Nagarajan, Ananthakiran Kandukuri, N. Agarwal","doi":"10.1145/3511808.3557086","DOIUrl":null,"url":null,"abstract":"Modern commercial database systems are increasingly evolving into a hybrid distributed system model where a primary database host system enlists the services of a loosely coupled secondary system that acts as an accelerator. Often the secondary system is a distributed system that can perform specific tasks massively parallelized with results fed back to the host database. Similar models can also be seen in architectures that separate compute from storage. As the scale of the system grows, failures of nodes become common, and the architectural goal is to recover the system with minimal disruption to the workload as seen by the user. This paper introduces a new framework that allows a host database to efficiently manage the availability of a massive secondary distributed system and describes a mechanism to achieve query fault tolerance at the primary database by transparently re-executing query (sub)plans on the secondary distributed system. The focus is on improving two important aspects of disruption ? downtime and transparency to the user. The proposed mechanisms achieve quick recovery, reduced duration of downtime and isolation of errors during query execution, thus improving execution transparency for the users.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511808.3557086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern commercial database systems are increasingly evolving into a hybrid distributed system model where a primary database host system enlists the services of a loosely coupled secondary system that acts as an accelerator. Often the secondary system is a distributed system that can perform specific tasks massively parallelized with results fed back to the host database. Similar models can also be seen in architectures that separate compute from storage. As the scale of the system grows, failures of nodes become common, and the architectural goal is to recover the system with minimal disruption to the workload as seen by the user. This paper introduces a new framework that allows a host database to efficiently manage the availability of a massive secondary distributed system and describes a mechanism to achieve query fault tolerance at the primary database by transparently re-executing query (sub)plans on the secondary distributed system. The focus is on improving two important aspects of disruption ? downtime and transparency to the user. The proposed mechanisms achieve quick recovery, reduced duration of downtime and isolation of errors during query execution, thus improving execution transparency for the users.