Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)最新文献_第8页

HP-Mapper: A High Performance Storage Driver for Docker Containers HP-Mapper:用于Docker容器的高性能存储驱动程序

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362718

Fan Guo, Yongkun Li, Min Lv, Yinlong Xu, John C.S. Lui

{"title":"HP-Mapper: A High Performance Storage Driver for Docker Containers","authors":"Fan Guo, Yongkun Li, Min Lv, Yinlong Xu, John C.S. Lui","doi":"10.1145/3357223.3362718","DOIUrl":"https://doi.org/10.1145/3357223.3362718","url":null,"abstract":"Docker containers are widely deployed to provide lightweight virtualization, and they have many desirable features such as ease of deployment and near bare-metal performance. However, both the performance and cache efficiency of containers are still limited by their storage drivers due to the coarse-grained copy-on-write operations, and the large amount of redundancy in both I/O requests and page cache. To improve I/O performance and cache efficiency of containers, we develop HP-Mapper, a high performance storage driver for Docker containers. HP-Mapper provides a two-level mapping strategy to support fine-grained copy-on-write with low overhead, and an efficient interception method to reduce redundant I/Os. Furthermore, it uses a novel cache management mechanism to reduce duplicate cached data. Experiment results with our prototype system show that HP-Mapper significantly reduces copy-on-write latency due to its finer-grained copy-on-write scheme. Moreover, HP-Mapper can also reduce 65.4% cache usage on average due to elimination of duplicated data. As a result, HP-Mapper improves the throughput of real-world workloads by up to 39.4%, and improves the startup speed of containers by 2.0x.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90705452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

MME-FaaS Cloud-Native Control for Mobile Networks 移动网络的MME-FaaS云原生控制

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362722

Sonika Jindal, R. Ricci

引用次数: 3

Reverb 混响

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362733

R. Netravali, James W. Mickens

{"title":"Reverb","authors":"R. Netravali, James W. Mickens","doi":"10.1145/3357223.3362733","DOIUrl":"https://doi.org/10.1145/3357223.3362733","url":null,"abstract":"Bugs are common in web pages. Unfortunately, traditional debugging primitives like breakpoints are crude tools for understanding the asynchronous, wide-area data flows that bind client-side JavaScript code and server-side application logic. In this paper, we describe Reverb, a powerful new debugger that makes data flows explicit and queryable. Reverb provides three novel features. First, Reverb tracks precise value provenance, allowing a developer to quickly identify the reads and writes to JavaScript state that affected a particular variable's value. Second, Reverb enables speculative bug fix analysis. A developer can replay a program to a certain point, change code or data in the program, and then resume the replay; Reverb uses the remaining log of nondeterministic events to influence the post-edit replay, allowing the developer to investigate whether the hypothesized bug fix would have helped the original execution run. Third, Reverb supports wide-area debugging for applications whose server-side components use event-driven architectures. By tracking the data flows between clients and servers, Reverb enables speculative replaying of the distributed application.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"11 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91401993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DCUDA: Dynamic GPU Scheduling with Live Migration Support DCUDA:支持实时迁移的动态GPU调度

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362714

Fan Guo, Yongkun Li, John C.S. Lui, Yinlong Xu

{"title":"DCUDA: Dynamic GPU Scheduling with Live Migration Support","authors":"Fan Guo, Yongkun Li, John C.S. Lui, Yinlong Xu","doi":"10.1145/3357223.3362714","DOIUrl":"https://doi.org/10.1145/3357223.3362714","url":null,"abstract":"In clouds and data centers, GPU servers which consist of multiple GPUs are widely deployed. Current state-of-the-art GPU scheduling algorithm are \"static\" in assigning applications to different GPUs. These algorithms usually ignore the dynamics of the GPU utilization and are often inaccurate in estimating resource demand before assigning/running applications, so there is a large opportunity to further load balance and to improve GPU utilization. Based on CUDA (Compute Unified Device Architecture), we develop a runtime system called DCUDA which supports \"dynamic\" scheduling of running applications between multiple GPUs. In particular, DCUDA provides a realtime and lightweight method to accurately monitor the resource demand of applications and GPU utilization. Furthermore, it provides a universal migration facility to migrate \"running applications\" between GPUs with negligible overhead. More importantly, DCUDA transparently supports all CUDA applications without changing their source codes. Experiments with our prototype system show that DCUDA can reduce 78.3% of overloaded time of GPUs on average. As a result, for different workloads consisting of a wide range applications we studied, DCUDA can reduce the average execution time of applications by up to 42.1%. Furthermore, DCUDA also reduces 13.3% energy in the light load scenario.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82314739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Sifter 筛

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362736

P. Las-Casas, Giorgi Papakerashvili, Vaastav Anand, Jonathan Mace

{"title":"Sifter","authors":"P. Las-Casas, Giorgi Papakerashvili, Vaastav Anand, Jonathan Mace","doi":"10.1145/3357223.3362736","DOIUrl":"https://doi.org/10.1145/3357223.3362736","url":null,"abstract":"Distributed tracing is a core component of cloud and datacenter systems, and provides visibility into their end-to-end runtime behavior. To reduce computational and storage overheads, most tracing frameworks do not keep all traces, but sample them uniformly at random. While effective at reducing overheads, uniform random sampling inevitably captures redundant, common-case execution traces, which are less useful for analysis and troubleshooting tasks. In this work we present Sifter, a general-purpose framework for biased trace sampling. Sifter captures qualitatively more diverse traces, by weighting sampling decisions towards edge-case code paths, infrequent request types, and anomalous events. Sifter does so by using the incoming stream of traces to build an unbiased low-dimensional model that approximates the system's common-case behavior. Sifter then biases sampling decisions towards traces that are poorly captured by this model. We have implemented Sifter, integrated it with several open-source tracing systems, and evaluate with traces from a range of open-source and production distributed systems. Our evaluation shows that Sifter effectively biases towards anomalous and outlier executions, is robust to noisy and heterogeneous traces, is efficient and scalable, and adapts to changes in workloads over time.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81540433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Accordia

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3365441

Yang Liu, Huanle Xu, W. Lau

引用次数: 1

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline HyperSched:最后期限上模型开发的动态资源重新分配

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362719

Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, I. Stoica, Alexey Tumanov

{"title":"HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline","authors":"Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph Gonzalez, I. Stoica, Alexey Tumanov","doi":"10.1145/3357223.3362719","DOIUrl":"https://doi.org/10.1145/3357223.3362719","url":null,"abstract":"Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched---a dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload overlooked in prior work -- trial disposability, progressively identifiable rankings among different configurations, and space-time constraints -- to outperform standard hyperparameter search algorithms across a variety of benchmarks.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"44 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90033712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Towards a Library for Deterministic Failure Testing of Distributed Systems 面向分布式系统确定性故障测试的库

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3366026

Armin Balalaie, James A. Jones

{"title":"Towards a Library for Deterministic Failure Testing of Distributed Systems","authors":"Armin Balalaie, James A. Jones","doi":"10.1145/3357223.3366026","DOIUrl":"https://doi.org/10.1145/3357223.3366026","url":null,"abstract":"Author(s): Balalaie, Armin | Advisor(s): Jones, James A. | Abstract: Distributed systems are widespread today, and they are being used to serve millions of customers and process huge amounts of data. These systems run on commodity hardware and in an environment with many uncertainties, e.g., partial network failures and race condition between nodes. Testing distributed systems requires new test libraries that take into account these uncertainties and can reproduce scenarios with specificc timing constraints in a programming-language-agnostic way. To this end, we present Failify, a cross-platform, programming-language-agnostic and deterministic failure testing library for distributed systems, which can be seamlessly integrated into different build systems. Failify, as an infrastructure, can also facilitate research in testing distributed systems in various ways. We experimented with six open-source distributed systems to show the compactness of the Failify's deployment API. Our results indicate that, in average, the most reliable deployment architecture for these systems can be defined in less that 17 lines of code. We also experimented with HDFS to demonstrate potential scenarios where Failify's deterministic environmental manipulation and failure injection API can be effective.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74583021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PerfDebug PerfDebug

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362727

Jason Teoh, Muhammad Ali Gulzar, Guoqing Harry Xu, Miryung Kim

{"title":"PerfDebug","authors":"Jason Teoh, Muhammad Ali Gulzar, Guoqing Harry Xu, Miryung Kim","doi":"10.1145/3357223.3362727","DOIUrl":"https://doi.org/10.1145/3357223.3362727","url":null,"abstract":"Performance is a key factor for big data applications, and much research has been devoted to optimizing these applications. While prior work can diagnose and correct data skew, the problem of computation skew---abnormally high computation costs for a small subset of input data---has been largely overlooked. Computation skew commonly occurs in real-world applications and yet no tool is available for developers to pinpoint underlying causes. To enable a user to debug applications that exhibit computation skew, we develop a post-mortem performance debugging tool. PerfDebug automatically finds input records responsible for such abnormalities in a big data application by reasoning about deviations in performance metrics such as job execution time, garbage collection time, and serialization time. The key to PerfDebug's success is a data provenance-based technique that computes and propagates record-level computation latency to keep track of abnormally expensive records throughout the pipeline. Finally, the input records that have the largest latency contributions are presented to the user for bug fixing. We evaluate PerfDebug via in-depth case studies and observe that remediation such as removing the single most expensive record or simple code rewrite can achieve up to 16X performance improvement.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"142 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75655903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Neptune 海王星

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2019-11-20 DOI: 10.1145/3357223.3362724

Panagiotis Garefalakis, Konstantinos Karanasos, P. Pietzuch

引用次数: 14