Proceedings of the 2017 Symposium on Cloud Computing最新文献_第5页

Efficient and consistent replication for distributed logs 分布式日志的高效和一致复制

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132695

Hua Fan, Jeffrey Pound, P. Bumbulis, Nathan Auch, Scott MacLean, Eric Garber, Anil K. Goel

{"title":"Efficient and consistent replication for distributed logs","authors":"Hua Fan, Jeffrey Pound, P. Bumbulis, Nathan Auch, Scott MacLean, Eric Garber, Anil K. Goel","doi":"10.1145/3127479.3132695","DOIUrl":"https://doi.org/10.1145/3127479.3132695","url":null,"abstract":"Distributed shared logs are a powerful building block for distributed systems. By providing fault-tolerant persistence and strong ordering guarantees, applications can use a distributed shared log to reliably communicate a stream of events between processes. This can be used, for example, to replicate application state or to build a reliable publish/subscribe system. The log itself must also replicate data in order to provide availability and fault-tolerance. Key to the design of a distributed shared log is the choice of replication algorithm, which will determine many properties of the system. We propose an algorithm for consistent replication of log data, quorum-replication with meta-data exchange (QMX), that is linearizable while allowing writes to be successful with only a single round-trip to a quorum of replicas and allowing reads to generally be serviced by any single replica, or read-one/write-quorum. This is achieved by coupling the reads with an asynchronous message exchange algorithm that continuously runs amongst the replicas. The message exchange algorithm allows replicas to infer the global state of writes across the cluster, in order to deduce which writes have been successfully quorum replicated and which have not. This metadata allows any single replica to directly answer reads in many cases, though in the worst case a read must wait for the message passing round to complete before being serviced which requires a majority quorum of servers to be responsive.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"336 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80644300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QFrag: distributed graph search via subgraph isomorphism 基于子图同构的分布式图搜索

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131625

M. Serafini, G. D. F. Morales, Georgos Siganos

{"title":"QFrag: distributed graph search via subgraph isomorphism","authors":"M. Serafini, G. D. F. Morales, Georgos Siganos","doi":"10.1145/3127479.3131625","DOIUrl":"https://doi.org/10.1145/3127479.3131625","url":null,"abstract":"This paper introduces QFrag, a distributed system for graph search on top of bulk synchronous processing (BSP) systems such as MapReduce and Spark. Searching for patterns in graphs is an important and computationally complex problem. Most current distributed search systems scale to graphs that do not fit in main memory by partitioning the input graph. For analytical queries, however, this approach entails running expensive distributed joins on large intermediate data. In this paper we explore an alternative approach: replicating the input graph and running independent parallel instances of a sequential graph search algorithm. In principle, this approach leads us to an embarrassingly parallel problem, since workers can complete their tasks in parallel without coordination. However, the skew present in natural graphs makes this problem a deceitfully parallel one, i.e., an embarrassingly parallel problem with poor load balancing. We therefore introduce a task fragmentation technique that avoids stragglers but at the same time minimizes coordination. Our evaluation shows that QFrag outperforms BSP-based systems by orders of magnitude, and performs similar to asynchronous MPI-based systems on simple queries. Furthermore, it is able to run computationally complex analytical queries that other systems are unable to handle.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81962859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Resilient cloud in dynamic resource environments 动态资源环境中的弹性云

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132571

Fan Yang, A. Chien, Haryadi S. Gunawi

{"title":"Resilient cloud in dynamic resource environments","authors":"Fan Yang, A. Chien, Haryadi S. Gunawi","doi":"10.1145/3127479.3132571","DOIUrl":"https://doi.org/10.1145/3127479.3132571","url":null,"abstract":"Traditional cloud stacks are designed to tolerate random, small-scale failures, and can successfully deliver highly-available cloud services and interactive services to end users. However, they fail to survive large-scale disruptions that are caused by major power outage, cyber-attack, or region/zone failures. Such changes trigger cascading failures and significant service outages. We propose to understand the reasons for these failures, and create reliable data services that can efficiently and robustly tolerate such large-scale resource changes. We believe cloud services will need to survive frequent, large dynamic resource changes in the future to be highly available. (1) Significant new challenges to cloud reliability are emerging, including cyber-attacks, power/network outages, and so on. For example, human error disrupted Amazon S3 service on 02/28/17 [2]. Recently hackers are even attacking electric utilities, which may lead to more outages [3, 6]. (2) Increased attention on resource cost optimization will increase usage dynamism, such as Amazon Spot Instances [1]. (3) Availability focused cloud applications will increasingly practice continuous testing to ensure they have no hidden source of catastrophic failure. For example, Netflix Simian Army can simulate the outages of individual servers, and even an entire AWS region [4]. (4) Cloud applications with dynamic flexibility will reap numerous benefits, such as flexible deployments, managing cost arbitrage and reliability arbitrage across cloud provides and datacenters, etc. Using Apache Cassandra [5] as the model system, we characterize its failure behavior under dynamic datacenter-scale resource changes. Each datacenter is volatile and randomly shut down with a given duty factor. We simulate read-only workload on a quorum-based system deployed across multiple datacenters, varying (1) system scale, (2) the fraction of volatile datacenters, and (3) the duty factor of volatile datacenters. We explore the space of various configurations, including replication factors and consistency levels, and measure the service availability (% of succeeded requests) and replication overhead (number of total replicas). Our results show that, in a volatile resource environment, the current replication and quorum protocols in Cassandra-like systems cannot high availability and consistency with low replication overhead. Our contributions include: (1) Detailed characterization of failures under dynamic datacenter-scale resource changes, showing that the exiting protocols in quorum-based systems cannot achieve high availability and consistency with low replication cost. (2) Study of the best achieve-able availability of data service in dynamic datacenter-scale resource environment.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88317122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SLAQ: quality-driven scheduling for distributed machine learning SLAQ:分布式机器学习的质量驱动调度

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127490

Haoyu Zhang, Logan Stafman, Andrew Or, M. Freedman

引用次数: 120

RStore: efficient multiversion document management in the cloud RStore:云端高效的多版本文档管理

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132693

Souvik Bhattacherjee, A. Deshpande

{"title":"RStore: efficient multiversion document management in the cloud","authors":"Souvik Bhattacherjee, A. Deshpande","doi":"10.1145/3127479.3132693","DOIUrl":"https://doi.org/10.1145/3127479.3132693","url":null,"abstract":"Motivation.The iterative and exploratory nature of the data science process, combined with an increasing need to support debugging, historical queries, auditing, provenance, and reproducibility, warrants the need to store and query a large number of versions of a dataset. This realization has led to many efforts at building data management systems that support versioning as a first-class construct, both in academia [1, 3, 5, 6] and in industry (e.g., git, Datomic, noms). These systems typically support rich versioning/branching functionality and complex queries over versioned information but lack the capability to host versions of a collection of keyed records or documents in a distributed environment or a cloud. Alternatively, key-value stores1 (e.g., Apache Cassandra, HBase, MongoDB) are appealing in many collaborative scenarios spanning geographically distributed teams, since they offer centralized hosting of the data, are resilient to failures, can easily scale out, and can handle a large number of queries efficiently. However, those do not offer rich versioning and branching functionality akin to hosted version control systems (VCS) like GitHub. This work addresses the problem of compactly storing a large number of versions (snapshots) of a collection of keyed documents or records in a distributed environment, while efficiently answering a variety of retrieval queries over those. RStore Overview. Our primary focus here is to provide versioning and branching support for collections of records with unique identifiers. Like popular NoSQL systems, RStore supports a flexible data model; records with varying sizes, ranging from a few bytes to a few MBs; and a variety of retrieval queries to cover a wide range of use cases. Specifically, similar to NoSQL systems, our system supports efficient retrieval of a specific record in a specific version (given a key and a version identifier), or the entire evolution history for a given key. Similar to VCS, it supports retrieving all records belonging to a specific version to support use cases that require updating a large number of records (e.g., by applying a data cleaning step). Finally, since retrieving an entire version might be unnecessary and expensive, our system supports partial version retrieval given a range of keys and a version identifier. Challenges. Addressing the above desiderata poses many design and computational challenges, and natural baseline approaches (see full paper [2] for more details) that attempt to build this functionality on top of existing key-value stores suffer from critical limitations. First, most of those baseline approaches cannot directly support point queries targetting a specific record in a specific version (and by extension, full or partial version retrieval queries), without constructing and maintaining explicit indexes. Second, all the viable baselines fundamentally require too many back-and-forths between the retrieval module and the backend key-value store; this ","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90255107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Trustable virtual machine scheduling in a cloud 云中的可信虚拟机调度

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3128608

Fabien Hermenier, L. Henrio

引用次数: 3

GLoop: an event-driven runtime for consolidating GPGPU applications GLoop:用于整合GPGPU应用程序的事件驱动运行时

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132023

Yusuke Suzuki, H. Yamada, S. Kato, K. Kono

引用次数: 7

ALOHA-KV: high performance read-only and write-only distributed transactions ALOHA-KV:高性能只读和只写分布式事务

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127487

Hua Fan, W. Golab, C. B. Morrey

{"title":"ALOHA-KV: high performance read-only and write-only distributed transactions","authors":"Hua Fan, W. Golab, C. B. Morrey","doi":"10.1145/3127479.3127487","DOIUrl":"https://doi.org/10.1145/3127479.3127487","url":null,"abstract":"There is a trend in recent database research to pursue coordination avoidance and weaker transaction isolation under a long-standing assumption: concurrent serializable transactions under read-write or write-write conflicts require costly synchronization, and thus may incur a steep price in terms of performance. In particular, distributed transactions, which access multiple data items atomically, are considered inherently costly. They require concurrency control for transaction isolation since both read-write and write-write conflicts are possible, and they rely on distributed commitment protocols to ensure atomicity in the presence of failures. This paper presents serializable read-only and write-only distributed transactions as a counterexample to show that concurrent transactions can be processed in parallel with low-overhead despite conflicts. Inspired by the slotted ALOHA network protocol, we propose a simpler and leaner protocol for serializable read-only write-only transactions, which uses only one round trip to commit a transaction in the absence of failures irrespective of contention. Our design is centered around an epoch-based concurrency control (ECC) mechanism that minimizes synchronization conflicts and uses a small number of additional messages whose cost is amortized across many transactions. We integrate this protocol into ALOHA-KV, a scalable distributed key-value store for read-only write-only transactions, and demonstrate that the system can process close to 15 million read/write operations per second per server when each transaction batches together thousands of such operations.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85148008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Disaggregated operating system 分解操作系统

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131617

Yizhou Shan, Sumukh Hallymysore, Yutong Huang, Yilun Chen, Yiying Zhang

引用次数: 2

Preserving I/O prioritization in virtualized OSes 在虚拟化操作系统中保持I/O优先级

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127484

Kun Suo, Yong Zhao, J. Rao, Luwei Cheng, Xiaobo Zhou, F. Lau

引用次数: 12