{"title":"IBIS:嵌入式大数据I/O调度器","authors":"Yiqi Xu, Ming Zhao","doi":"10.1145/2907294.2907319","DOIUrl":null,"url":null,"abstract":"Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of big-data applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportional-share I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"IBIS: Interposed Big-data I/O Scheduler\",\"authors\":\"Yiqi Xu, Ming Zhao\",\"doi\":\"10.1145/2907294.2907319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of big-data applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportional-share I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).\",\"PeriodicalId\":20515,\"journal\":{\"name\":\"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2907294.2907319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2907294.2907319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of big-data applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportional-share I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).