Proceedings of the 2017 Symposium on Cloud Computing最新文献_第6页

ALOHA-KV: high performance read-only and write-only distributed transactions ALOHA-KV:高性能只读和只写分布式事务

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127487

Hua Fan, W. Golab, C. B. Morrey

{"title":"ALOHA-KV: high performance read-only and write-only distributed transactions","authors":"Hua Fan, W. Golab, C. B. Morrey","doi":"10.1145/3127479.3127487","DOIUrl":"https://doi.org/10.1145/3127479.3127487","url":null,"abstract":"There is a trend in recent database research to pursue coordination avoidance and weaker transaction isolation under a long-standing assumption: concurrent serializable transactions under read-write or write-write conflicts require costly synchronization, and thus may incur a steep price in terms of performance. In particular, distributed transactions, which access multiple data items atomically, are considered inherently costly. They require concurrency control for transaction isolation since both read-write and write-write conflicts are possible, and they rely on distributed commitment protocols to ensure atomicity in the presence of failures. This paper presents serializable read-only and write-only distributed transactions as a counterexample to show that concurrent transactions can be processed in parallel with low-overhead despite conflicts. Inspired by the slotted ALOHA network protocol, we propose a simpler and leaner protocol for serializable read-only write-only transactions, which uses only one round trip to commit a transaction in the absence of failures irrespective of contention. Our design is centered around an epoch-based concurrency control (ECC) mechanism that minimizes synchronization conflicts and uses a small number of additional messages whose cost is amortized across many transactions. We integrate this protocol into ALOHA-KV, a scalable distributed key-value store for read-only write-only transactions, and demonstrate that the system can process close to 15 million read/write operations per second per server when each transaction batches together thousands of such operations.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85148008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Resilient cloud in dynamic resource environments 动态资源环境中的弹性云

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132571

Fan Yang, A. Chien, Haryadi S. Gunawi

{"title":"Resilient cloud in dynamic resource environments","authors":"Fan Yang, A. Chien, Haryadi S. Gunawi","doi":"10.1145/3127479.3132571","DOIUrl":"https://doi.org/10.1145/3127479.3132571","url":null,"abstract":"Traditional cloud stacks are designed to tolerate random, small-scale failures, and can successfully deliver highly-available cloud services and interactive services to end users. However, they fail to survive large-scale disruptions that are caused by major power outage, cyber-attack, or region/zone failures. Such changes trigger cascading failures and significant service outages. We propose to understand the reasons for these failures, and create reliable data services that can efficiently and robustly tolerate such large-scale resource changes. We believe cloud services will need to survive frequent, large dynamic resource changes in the future to be highly available. (1) Significant new challenges to cloud reliability are emerging, including cyber-attacks, power/network outages, and so on. For example, human error disrupted Amazon S3 service on 02/28/17 [2]. Recently hackers are even attacking electric utilities, which may lead to more outages [3, 6]. (2) Increased attention on resource cost optimization will increase usage dynamism, such as Amazon Spot Instances [1]. (3) Availability focused cloud applications will increasingly practice continuous testing to ensure they have no hidden source of catastrophic failure. For example, Netflix Simian Army can simulate the outages of individual servers, and even an entire AWS region [4]. (4) Cloud applications with dynamic flexibility will reap numerous benefits, such as flexible deployments, managing cost arbitrage and reliability arbitrage across cloud provides and datacenters, etc. Using Apache Cassandra [5] as the model system, we characterize its failure behavior under dynamic datacenter-scale resource changes. Each datacenter is volatile and randomly shut down with a given duty factor. We simulate read-only workload on a quorum-based system deployed across multiple datacenters, varying (1) system scale, (2) the fraction of volatile datacenters, and (3) the duty factor of volatile datacenters. We explore the space of various configurations, including replication factors and consistency levels, and measure the service availability (% of succeeded requests) and replication overhead (number of total replicas). Our results show that, in a volatile resource environment, the current replication and quorum protocols in Cassandra-like systems cannot high availability and consistency with low replication overhead. Our contributions include: (1) Detailed characterization of failures under dynamic datacenter-scale resource changes, showing that the exiting protocols in quorum-based systems cannot achieve high availability and consistency with low replication cost. (2) Study of the best achieve-able availability of data service in dynamic datacenter-scale resource environment.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88317122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks PBSE:数据并行框架中退化网络尾部容忍度的稳健的基于路径的推测执行

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131622

Riza O. Suminto, Cesar A. Stuardo, Alexandra Clark, Huan Ke, Tanakorn Leesatapornwongsa, Bo Fu, D. Kurniawan, V. Martin, Maheswara Rao G. Uma, Haryadi S. Gunawi

引用次数: 20

Building smart memories and high-speed cloud services for the internet of things with derecho 通过derecho为物联网构建智能记忆和高速云服务

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3134597

Sagar Jha, J. Behrens, Theo Gkountouvas, Mae Milano, Weijia Song, E. Tremel, Sydney Zink, K. Birman, R. V. Renesse

{"title":"Building smart memories and high-speed cloud services for the internet of things with derecho","authors":"Sagar Jha, J. Behrens, Theo Gkountouvas, Mae Milano, Weijia Song, E. Tremel, Sydney Zink, K. Birman, R. V. Renesse","doi":"10.1145/3127479.3134597","DOIUrl":"https://doi.org/10.1145/3127479.3134597","url":null,"abstract":"The coming generation of Internet-of-Things (IoT) applications will process massive amounts of incoming data while supporting data mining and online learning. In cases with demanding real-time requirements, such systems behave as smart memories: a high-bandwidth service that captures sensor input, processes it using machine-learning tools, replicates and stores \"interesting\" data (discarding uninteresting content), updates knowledge models, and triggers urgently-needed responses. Derecho is a high-throughput library for building smart memories and similar services. At its core Derecho implements atomic multicast (Vertical Paxos) and state machine replication (the classic durable Paxos). Derecho's replicated template defines a replicated type; the corresponding objects are associated with subgroups, which can be sharded into key-value structures. The persistent and volatile storage templates implement version vectors with optional NVM persistence. These support time-indexed access, offering lock-free snapshot isolation that blends temporal precision and causal consistency. Derecho automates application management, supporting multigroup structures and providing consistent knowledge of the current membership mapping. A query can access data from many shards or subgroups, and consistency is guaranteed without any form of distributed locking. Whereas many systems run consensus on the critical path, Derecho requires consensus only when updating membership. By leveraging an RDMA data plane and NVM storage, and adopting a novel receiver-side batching technique, Derecho can saturate a 12.5GB RDMA network, sending millions of events per second in each subgroup or shard. In a single subgroup with 2--16 members, through-put peaks at 16 GB/s for large (100MB or more) objects. While key-value subgroups would typically use 2 or 3-member shards, unsharded subgroups could be large. In tests with a 128-member group, Derecho's multicast and Paxos protocols were just 3--5x slower than for a small group, depending on the traffic pattern. With network contention, slow members, or overlapping groups that generate concurrent traffic, Derecho's protocols remain stable and adapt to the available bandwidth.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75497175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Job scheduling for data-parallel frameworks with hybrid electrical/optical datacenter networks 混合电/光数据中心网络数据并行框架的作业调度

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132694

Zhuozhao Li, Haiying Shen

引用次数: 3

HyperNF: building a high performance, high utilization and fair NFV platform HyperNF:构建高性能、高利用率、公平的NFV平台

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127489

Kenichi Yasukata, Felipe Huici, Vincenzo Maffione, G. Lettieri, Michio Honda

{"title":"HyperNF: building a high performance, high utilization and fair NFV platform","authors":"Kenichi Yasukata, Felipe Huici, Vincenzo Maffione, G. Lettieri, Michio Honda","doi":"10.1145/3127479.3127489","DOIUrl":"https://doi.org/10.1145/3127479.3127489","url":null,"abstract":"Network Function Virtualization has been touted as the silver bullet for tackling a number of operator problems, including vendor lock-in, fast deployment of new functionality, converged management, and lower expenditure since packet processing runs on inexpensive commodity servers. The reality, however, is that, in practice, it has proved hard to achieve the stable, predictable performance provided by hardware middleboxes, and so operators have essentially resorted to throwing money at the problem, deploying highly underutilized servers (e.g., one NF per CPU core) in order to guarantee high performance during peak periods and meet SLAs. In this work we introduce HyperNF, a high performance NFV framework aimed at maximizing server performance when concurrently running large numbers of NFs. To achieve this, HyperNF implements hypercall-based virtual I/O, placing packet forwarding logic inside the hypervisor to significantly reduce I/O synchronization overheads. HyperNF improves throughput by 10%-73% depending on the NF, is able to closely match resource allocation specifications (with deviations of only 3.5%), and to efficiently cope with changing traffic loads.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83853919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Selecting the best VM across multiple public clouds: a data-driven performance modeling approach 跨多个公共云选择最佳VM:数据驱动的性能建模方法

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131614

N. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, Burton J. Smith, R. Katz

{"title":"Selecting the best VM across multiple public clouds: a data-driven performance modeling approach","authors":"N. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, Burton J. Smith, R. Katz","doi":"10.1145/3127479.3131614","DOIUrl":"https://doi.org/10.1145/3127479.3131614","url":null,"abstract":"Users of cloud services are presented with a bewildering choice of VM types and the choice of VM can have significant implications on performance and cost. In this paper we address the fundamental problem of accurately and economically choosing the best VM for a given workload and user goals. To address the problem of optimal VM selection, we present PARIS, a data-driven system that uses a novel hybrid offline and online data collection and modeling framework to provide accurate performance estimates with minimal data collection. PARIS is able to predict workload performance for different user-specified metrics, and resulting costs for a wide range of VM types and workloads across multiple cloud providers. When compared to sophisticated baselines, including collaborative filtering and a linear interpolation model using measured workload performance on two VM types, PARIS produces significantly better estimates of performance. For instance, it reduces runtime prediction error by a factor of 4 for some workloads on both AWS and Azure. The increased accuracy translates into a 45% reduction in user cost while maintaining performance.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88861757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 162

Distributed resource management across process boundaries 跨流程边界的分布式资源管理

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132020

L. Suresh, P. Bodík, Ishai Menache, M. Canini, F. Ciucu

引用次数: 40

Towards an emergency edge supercloud 走向紧急边缘超级云

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132253

Kolbeinn Karlsson, Zhiming Shen, Weijia Song, Hakim Weatherspoon, R. V. Renesse, S. Wicker

引用次数: 0

Towards verifiable metering for database as a service providers 面向可验证计量的数据库服务提供商

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3134349

Min Du, Ravishankar Ramamurthy

引用次数: 0