{"title":"Scaling Blockchains Using Pipelined Execution and Sparse Peers","authors":"Parth Thakkar, Senthilnathan Natarajan","doi":"10.1145/3472883.3486975","DOIUrl":"https://doi.org/10.1145/3472883.3486975","url":null,"abstract":"Large cloud providers such as AWS and IBM now provide managed blockchain platforms, showcasing an active interest in blockchains. Unfortunately, blockchains provide poor performance and scalability. This is true even for the Execute-Order-Validate (EOV) style of blockchains which improves over the traditional Order-Execute architecture. We experimentally show that EOV platforms scale poorly using both vertical and horizontal scaling approaches. We find that the throughput is bottlenecked by the Validation and Commit phases, which poorly utilize the resources, limiting performance and scalability. We introduce three ideas to improve the performance, scalability, and cost-efficiency of EOV platforms. We first propose a provably correct way to pipeline the Validation and Commit phases. We then introduce a new type of node called sparse peer that validates and commits only a subset of transactions. Finally, we propose a technique that makes it possible for these systems to elastically scale as per the load. Our implementation provides 3.7x and 2.1x higher throughput than Hyperledger Fabric and FastFabric for the same infrastructure cost while providing their peak throughputs at 87% and 50% lower cost. It also makes dynamic horizontal scaling practical by providing 12-26X faster scale-up times.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75954403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SHOWAR","authors":"A. F. Baarzi, G. Kesidis","doi":"10.1145/3472883.3486999","DOIUrl":"https://doi.org/10.1145/3472883.3486999","url":null,"abstract":"Microservices architecture have been widely adopted in designing distributed cloud applications where the application is decoupled into multiple small components (i.e. \"microservices\"). One of the challenges in deploying microservices is finding the optimal amount of resources (i.e. size) and the number of instances (i.e. replicas) for each microservice in order to maintain a good performance as well as prevent resource wastage and under-utilization which is not cost-effective. This paper presents SHOWAR, a framework that configures the resources by determining the number of replicas (horizontal scaling) and the amount of CPU and Memory for each microservice (vertical scaling). For vertical scaling, SHOWAR uses empirical variance in the historical resource usage to find the optimal size and mitigate resource wastage. For horizontal scaling, SHOWAR uses basic ideas from control theory along with kernel level performance metrics. Additionally, once the size for each microservice is found, SHOWAR bridges the gap between optimal resource allocation and scheduling by generating affinity rules (i.e. hints) for the scheduler to further improve the performance. Our experiments, using a variety of microservice applications and real-world workloads, show that, compared to the state-of-the-art autoscaling and scheduling systems, SHOWAR on average improves the resource allocation by up to 22% while improving the 99th percentile end-to-end user request latency by 20%.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80800215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Mao, Yuan-Hui Huang, Runxin Tian, Xin Wang, Richard T. B. Ma
{"title":"Trisk","authors":"Y. Mao, Yuan-Hui Huang, Runxin Tian, Xin Wang, Richard T. B. Ma","doi":"10.1145/3472883.3487010","DOIUrl":"https://doi.org/10.1145/3472883.3487010","url":null,"abstract":"Due to the long-run and unpredictable nature of stream processing, any statically configuredexecution of stream jobs fails to process data in a timely and efficient manner. To achieve performance requirements, stream jobs need to be reconfigured dynamically. In this paper, we present Trisk, a control plane that support versatile reconfigurations while keeping high efficiency with easy-to-use programming APIs. Trisk enables versatile reconfigurations with usability based on a task-centric abstraction, and encapsulates primitive operations such that reconfigurations can be described by compositing the primitive operations on the abstraction. Trisk adopts a partial pause-and-resume design for efficiency, through which synchronization mechanisms in the native stream systems can further be leveraged. We implement Trisk on Apache Flink and demonstrate its usage and performance under realistic application scenarios. We show that Trisk executes reconfigurations with shorter completion time and comparable latency compared to a state-of-the-art fluid mechanism for state management.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77745636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuepeng Li, Deze Zeng, Lin Gu, Quan Chen, Song Guo, Albert Y. Zomaya, M. Guo
{"title":"Lasagna: Accelerating Secure Deep Learning Inference in SGX-enabled Edge Cloud","authors":"Yuepeng Li, Deze Zeng, Lin Gu, Quan Chen, Song Guo, Albert Y. Zomaya, M. Guo","doi":"10.1145/3472883.3486988","DOIUrl":"https://doi.org/10.1145/3472883.3486988","url":null,"abstract":"Edge intelligence has already been widely regarded as a key enabling technology in a variety of domains. Along with the prosperity, increasing concern is raised on the security and privacy of intelligent applications. As these applications are usually deployed on shared and untrusted edge servers, malicious co-located attackers, or even untrustworthy infrastructure providers, may acquire highly security-sensitive data and code (i.e., the pre-trained model). Software Guard Extensions (SGX) provides an isolated Trust Execution Environment (TEE) for task security guarantee. However, we notice that DNN inference performance in SGX is severely affected by the limited enclave memory space due to the resultant frequent page swapping operations and the high enclave call overhead. To tackle this problem, we propose Lasagna, an SGX oriented DNN inference performance acceleration framework without compromising the task security. Lasagna consists of a local task scheduler and a global task balancer to optimize the system performance by exploring the layered-structure of DNN models. Our experiment results show that our layer-aware Lasagna effectively speeds up the well-known DNN inference in SGX by 1.31x-1.97x.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90259998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viyom Mittal, Shixiong Qi, Ratnadeep Bhattacharya, Xiaosu Lyu, Junfeng Li, Sameer G. Kulkarni, Dan Li, Jinho Hwang, K. Ramakrishnan, Timothy Wood
{"title":"Mu","authors":"Viyom Mittal, Shixiong Qi, Ratnadeep Bhattacharya, Xiaosu Lyu, Junfeng Li, Sameer G. Kulkarni, Dan Li, Jinho Hwang, K. Ramakrishnan, Timothy Wood","doi":"10.1145/3472883.3487014","DOIUrl":"https://doi.org/10.1145/3472883.3487014","url":null,"abstract":": The article was carried out determine the edaphic conditions on the surfaces of the protection forest in the alluvial plain of the Danube River. The dominant systematic units of soil was black meadow soil with a share of 54%. Solonetz covered more than 20% of area. The dominant physical characteristics are reflected in an unfavorable texture composition. Texture class ranged from sandy clay loam to clay loam. Determined soil are highly management unit Ristovača . The highest content of available water and carbon was detected on the black meadow soil. The article was carried out determine the edaphic conditions on the surfaces of the protection forest in the alluvial plain of the Danube River. The dominant systematic units of soil was black meadow soil with a share of 54%. Solonetz covered more then 20% of area. The dominant physical characteristics are reflected in an unfavorable texture composition. Texture class ranged from sandy clay loam to clay loam. Determined soil are highly carbonated and with a low content of humus. Salinization are determined at 46 % of the management unit Ristovača . The highest content of available water and carbon was detected on the black meadow soil.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82205340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Provisioning Differentiated Last-Level Cache Allocations to VMs in Public Clouds","authors":"Mohammad Shahrad, S. Elnikety, R. Bianchini","doi":"10.1145/3472883.3487006","DOIUrl":"https://doi.org/10.1145/3472883.3487006","url":null,"abstract":"Public cloud providers offer access to hardware resources and users rent resources by choosing among many VM sizes. While users choose the CPU core count and main memory size per VM, they cannot specify last-level cache (LLC) requirements. LLC is typically shared among all cores of a modern CPU causing cache contention and performance interference among co-located VMs. Consequently, a user's only way to avoid this interference is purchasing a full-server VM to prevent co-tenants. Although researchers have studied LLC partitioning and despite its availability in commodity processors, LLC QoS has not been offered to public cloud users today. Existing techniques rely mostly on performance profiling, which is not feasible in public cloud settings with opaque VMs. Moreover, prior work does not address how to deliver differentiated LLC allocations at scale. In this work, we develop CacheSlicer, the first system that provides cluster-level support for LLC management in a public cloud. We show how to provide LLC allocations in a major public cloud provider to enable differentiated VM categories, from which users select VMs that match their workloads. We integrate it into the Azure VM scheduler and show its effectiveness through extensive evaluations.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"199 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80078026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher S. Meiklejohn, Andrea Estrada, Yiwen Song, Heather Miller, Rohan Padhye
{"title":"Service-Level Fault Injection Testing","authors":"Christopher S. Meiklejohn, Andrea Estrada, Yiwen Song, Heather Miller, Rohan Padhye","doi":"10.1145/3472883.3487005","DOIUrl":"https://doi.org/10.1145/3472883.3487005","url":null,"abstract":"Companies today increasingly rely on microservice architectures to deliver service for their large-scale mobile or web applications. However, not all developers working on these applications are distributed systems engineers and therefore do not anticipate partial failure: where one or more of the dependencies of their service might be unavailable once deployed into production. Therefore, it is paramount that these issues be raised early and often, ideally in a testing environment or before the code ships to production. In this paper, we present an approach called service-level fault injection testing and a prototype implementation called Filibuster, that can be used to systematically identify resilience issues early in the development of microservice applications. Filibuster combines static analysis and concolicstyle execution with a novel dynamic reduction algorithm to extend existing functional test suites to cover failure scenarios with minimal developer effort. To demonstrate the applicability of our tool, we present a corpus of 4 real-world industrial microservice applications containing bugs. These applications and bugs are taken from publicly available information of chaos engineering experiments run by large companies in production. We then demonstrate how all of these chaos experiments could have been run during development instead, and the bugs they discovered detected long before they ended up in production.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83114235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lisa Dunlap, Kirthevasan Kandasamy, Ujval Misra, Richard Liaw, Michael I. Jordan, I. Stoica, Joseph Gonzalez
{"title":"Elastic Hyperparameter Tuning on the Cloud","authors":"Lisa Dunlap, Kirthevasan Kandasamy, Ujval Misra, Richard Liaw, Michael I. Jordan, I. Stoica, Joseph Gonzalez","doi":"10.1145/3472883.3486989","DOIUrl":"https://doi.org/10.1145/3472883.3486989","url":null,"abstract":"Hyperparameter tuning is a necessary step in training and deploying machine learning models. Most prior work on hyperparameter tuning has studied methods for maximizing model accuracy under a time constraint, assuming a fixed cluster size. While this is appropriate in data center environments, the increased deployment of machine learning workloads in cloud settings necessitates studying hyperparameter tuning with an elastic cluster size and time and monetary budgets. While recent work has leveraged the elasticity of the cloud to minimize the execution cost of a pre-determined hyperparameter tuning job originally designed for fixed-cluster sizes, they do not aim to maximize accuracy. In this work, we aim to maximize accuracy given time and cost constraints. We introduce SEER---Sequential Elimination with Elastic Resources, an algorithm that tests different hyperparameter values in the beginning and maintains varying degrees of parallelism among the promising configurations to ensure that they are trained sufficiently before the deadline. Unlike fixed cluster size methods, it is able to exploit the flexibility in resource allocation the elastic setting has to offer in order to avoid undesirable effects of sublinear scaling. Furthermore, SEER can be easily integrated into existing systems and makes minimal assumptions about the workload. On a suite of benchmarks, we demonstrate that SEER outperforms both existing methods for hyperparameter tuning on a fixed cluster as well as naive extensions of these algorithms to the cloud setting.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84333373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating the cloud with quantum","authors":"K. Svore","doi":"10.1145/3472883.3517039","DOIUrl":"https://doi.org/10.1145/3472883.3517039","url":null,"abstract":"Azure Quantum. The power of Azure, accelerated by Quantum. This is the present and future of cloud computing. A quantum-classical compute fabric that holds the promise of helping to remake our global economy, by offering new capabilities to help solve some of our planet's biggest challenges--in energy, climate, agriculture, and health--and across a broad span of industrial sectors, including computational chemistry, materials science, and nuclear and particle physics. Quantum computers are accelerators to large-scale classical compute and receive instructions and cues from classical processors. The success of quantum computation requires seamless integration in a high-performance cloud, to enable hyperscale workloads with complex classical pre- and post-processing. It also requires scaling up; to achieve the full promise, we need a leadership-class quantum machine with more than 1M physical qubits. I'll share the types of problems such a quantum machine will accelerate in the cloud, and how you can program and develop towards these advances today with Azure Quantum. I'll also share how quantum ideas are being emulated to enhance classical solutions, enabling quantum impact right now. Welcome to the quantum era of computing.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86204149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vivek M. Bhasi, J. Gunasekaran, P. Thinakaran, Cyan Subhra Mishra, M. Kandemir, C. Das
{"title":"Kraken","authors":"Vivek M. Bhasi, J. Gunasekaran, P. Thinakaran, Cyan Subhra Mishra, M. Kandemir, C. Das","doi":"10.1145/3472883.3486992","DOIUrl":"https://doi.org/10.1145/3472883.3486992","url":null,"abstract":"The growing popularity of microservices has led to the proliferation of online cloud service-based applications, which are typically modelled as Directed Acyclic Graphs (DAGs) comprising of tens to hundreds of microservices. The vast majority of these applications are user-facing, and hence, have stringent SLO requirements. Serverless functions, having short resource provisioning times and instant scalability, are suitable candidates for developing such latency-critical applications. However, existing serverless providers are unaware of the workflow characteristics of application DAGs, leading to container over-provisioning in many cases. This is further exacerbated in the case of dynamic DAGs, where the function chain for an application is not known a priori. Motivated by these observations, we propose Kraken, a workflow-aware resource management framework that minimizes the number of containers provisioned for an application DAG while ensuring SLO-compliance. We design and implement Kraken on OpenFaaS and evaluate it on a multi-node Kubernetes-managed cluster. Our extensive experimental evaluation using DeathStarbench workload suite and real-world traces demonstrates that Kraken spawns up to 76% fewer containers, thereby improving container utilization and saving cluster-wide energy by up to 4x and 48%, respectively, when compared to state-of-the art schedulers employed in serverless platforms.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78349882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}