{"title":"Sketches of space: ownership accounting for shared storage","authors":"Jake Wires, P. Ganesan, A. Warfield","doi":"10.1145/3127479.3132021","DOIUrl":"https://doi.org/10.1145/3127479.3132021","url":null,"abstract":"Efficient snapshots are an important feature of modern storage systems. However, the implicit sharing underlying most snapshot implementations makes it difficult to answer basic questions about the storage costs of individual snapshots. Traditional techniques for answering these questions incur significant performance penalties due to expensive metadata overheads. We present a novel probabilistic data structure, compatible with existing storage systems, that can provide approximate answers about snapshot costs with very low computational and storage overheads while achieving better than 95% accuracy for real-world data sets.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75382740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SQML: large-scale in-database machine learning with pure SQL","authors":"Umar Syed, Sergei Vassilvitskii","doi":"10.1145/3127479.3132746","DOIUrl":"https://doi.org/10.1145/3127479.3132746","url":null,"abstract":"Many enterprises have migrated their data from an on-site database to a cloud-based database-as-a-service that handles all database-related administrative tasks while providing a simple SQL interface to the end user. Businesses are also increasingly relying on machine learning to understand their customers and develop new products. Given these converging trends, there is a pressing need for database-as-a-service providers to add support for sophisticated machine learning algorithms to the core functionality of their products.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73625262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An implementation of fast memset() using hardware accelerators: extended abstract","authors":"K. Pusukuri, R. Gardner, Jared C. Smolens","doi":"10.1145/3127479.3132573","DOIUrl":"https://doi.org/10.1145/3127479.3132573","url":null,"abstract":"Multicore systems with large caches and huge main memories have become ubiquitous. They provide an attractive opportunity to maximize performance of big-memory applications such as in-memory databases, key-value stores, and graph analytics. However, these big-memory applications require many virtual-to-physical address translations, which increase TLB miss rate and hurt performance. To address this problem, modern hardware and OSes introduced support for huge pages. For example, on SPARC M7, Linux supports 8MB, 2GB, and 16GB huge pages (in addition to the default 8KB). Likewise, Linux supports 2MB and 1GB huge pages on Intel Xeon (E5-2630) platforms.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76943924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, Y. Yang
{"title":"BestConfig: tapping the performance potential of systems via automatic configuration tuning","authors":"Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, Y. Yang","doi":"10.1145/3127479.3128605","DOIUrl":"https://doi.org/10.1145/3127479.3128605","url":null,"abstract":"An ever increasing number of configuration parameters are provided to system users. But many users have used one configuration setting across different workloads, leaving untapped the performance potential of systems. A good configuration setting can greatly improve the performance of a deployed system under certain workloads. But with tens or hundreds of parameters, it becomes a highly costly task to decide which configuration setting leads to the best performance. While such task requires the strong expertise in both the system and the application, users commonly lack such expertise. To help users tap the performance potential of systems, we present Best Config, a system for automatically finding a best configuration setting within a resource limit for a deployed system under a given application workload. BestConfig is designed with an extensible architecture to automate the configuration tuning for general systems. To tune system configurations within a resource limit, we propose the divide-and-diverge sampling method and the recursive bound-and-search algorithm. BestConfig can improve the throughput of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce the running time of Hive join job by about 50% and that of Spark join job by about 80%, solely by configuration adjustment.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75006110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud","authors":"Zhigang Wang, Lixin Gao, Yu Gu, Y. Bao, Ge Yu","doi":"10.1145/3127479.3128612","DOIUrl":"https://doi.org/10.1145/3127479.3128612","url":null,"abstract":"Myriad of parameter estimation algorithms can be performed by an Expectation-Maximization (EM) approach. Traditional synchronous frameworks can parallelize these EM algorithms on the cloud to accelerate computation while guaranteeing the convergence. However, expensive synchronization costs pose great challenges for efficiency. Asynchronous solutions have been recently designed to bypass high-cost synchronous barriers but at expense of potentially losing convergence guarantee. This paper first proposes a flexible synchronous parallel framework (FSP) that provides the capability of synchronous EM algorithms implementations, as well as significantly reduces the barrier cost. Under FSP, every distributed worker can immediately suspend local computation when necessary, to quickly synchronize with each other. That maximizes the time fast workers spend doing useful work, instead of waiting for slow, straggling workers. We then formally prove the algorithm convergence. Further, we analyze how to automatically identify a proper barrier interval to strike a nice balance between reduced synchronization costs and the convergence speed. Empirical results demonstrate that on a broad spectrum of real-world and synthetic datasets, FSP achieves as much as 3x speedup over the up-to-date synchronous solution.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78047355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanfang Le, Hyunseok Chang, S. Mukherjee, Limin Wang, Aditya Akella, M. Swift, T. V. Lakshman
{"title":"UNO: uniflying host and smart NIC offload for flexible packet processing","authors":"Yanfang Le, Hyunseok Chang, S. Mukherjee, Limin Wang, Aditya Akella, M. Swift, T. V. Lakshman","doi":"10.1145/3127479.3132252","DOIUrl":"https://doi.org/10.1145/3127479.3132252","url":null,"abstract":"Increasingly, smart Network Interface Cards (sNICs) are being used in data centers to offload networking functions (NFs) from host processors thereby making these processors available for tenant applications. Modern sNICs have fully programmable, energy-efficient multi-core processors on which many packet processing functions, including a full-blown programmable switch, can run. However, having multiple switch instances deployed across the host hypervisor and the attached sNICs makes controlling them difficult and data plane operations more complex. This paper proposes a generalized SDN-controlled NF offload architecture called UNO. It can transparently offload dynamically selected host processors' packet processing functions to sNICs by using multiple switches in the host while keeping the data centerwide network control and management planes unmodified. UNO exposes a single virtual control plane to the SDN controller and hides dynamic NF offload behind a unified virtual management plane. This enables UNO to make optimal use of host's and sNIC's combined packet processing capabilities with local optimization based on locally observed traffic patterns and resource consumption, and without central controller involvement. Experimental results based on a real UNO prototype in realistic scenarios show promising results: it can save processing worth up to 8 CPU cores, reduce power usage by up to 2x, and reduce the control plane overhead by more than 50%.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80942035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conglong Li, D. Andersen, Qiang Fu, S. Elnikety, Yuxiong He
{"title":"Workload analysis and caching strategies for search advertising systems","authors":"Conglong Li, D. Andersen, Qiang Fu, S. Elnikety, Yuxiong He","doi":"10.1145/3127479.3129255","DOIUrl":"https://doi.org/10.1145/3127479.3129255","url":null,"abstract":"Search advertising depends on accurate predictions of user behavior and interest, accomplished today using complex and computationally expensive machine learning algorithms that estimate the potential revenue gain of thousands of candidate advertisements per search query. The accuracy of this estimation is important for revenue, but the cost of these computations represents a substantial expense, e.g., 10% to 30% of the total gross revenue. Caching the results of previous computations is a potential path to reducing this expense, but traditional domain-agnostic and revenue-agnostic approaches to do so result in substantial revenue loss. This paper presents three domain-specific caching mechanisms that successfully optimize for both factors. Simulations on a trace from the Bing advertising system show that a traditional cache can reduce cost by up to 27.7% but has negative revenue impact as bad as -14.1%. On the other hand, the proposed mechanisms can reduce cost by up to 20.6% while capping revenue impact between -1.3% and 0%. Based on Microsoft's earnings release for FY16 Q4, the traditional cache would reduce the net profit of Bing Ads by $84.9 to $166.1 million in the quarter, while our proposed cache could increase the net profit by $11.1 to $71.5 million.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88997614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Aguilera, Nadav Amit, I. Calciu, Xavier Deguillard, Jayneel Gandhi, Pratap Subrahmanyam, L. Suresh, K. Tati, Rajesh Venkatasubramanian, M. Wei
{"title":"Remote memory in the age of fast networks","authors":"M. Aguilera, Nadav Amit, I. Calciu, Xavier Deguillard, Jayneel Gandhi, Pratap Subrahmanyam, L. Suresh, K. Tati, Rajesh Venkatasubramanian, M. Wei","doi":"10.1145/3127479.3131612","DOIUrl":"https://doi.org/10.1145/3127479.3131612","url":null,"abstract":"As the latency of the network approaches that of memory, it becomes increasingly attractive for applications to use remote memory---random-access memory at another computer that is accessed using the virtual memory subsystem. This is an old idea whose time has come, in the age of fast networks. To work effectively, remote memory must address many technical challenges. In this paper, we enumerate these challenges, discuss their feasibility, explain how some of them are addressed by recent work, and indicate other promising ways to tackle them. Some challenges remain as open problems, while others deserve more study. In this paper, we hope to provide a broad research agenda around this topic, by proposing more problems than solutions.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82436920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Lolos, I. Konstantinou, Verena Kantere, N. Koziris
{"title":"Rethinking reinforcement learning for cloud elasticity","authors":"K. Lolos, I. Konstantinou, Verena Kantere, N. Koziris","doi":"10.1145/3127479.3131211","DOIUrl":"https://doi.org/10.1145/3127479.3131211","url":null,"abstract":"Cloud elasticity, i.e., the dynamic allocation of resources to applications to meet fluctuating workload demands, has been one of the greatest challenges in cloud computing. Approaches based on reinforcement learning have been proposed but they require a large number of states in order to model complex application behavior. In this work we propose a novel reinforcement learning approach that employs adaptive state space partitioning. The idea is to start from one state that represents the entire environment and partition this into finer-grained states adaptively to the observed workload and system behavior following a decision-tree approach. We explore novel statistical criteria and strategies that decide both the correct parameters and the appropriate time to perform the partitioning.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82508476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Shahrad, C. Klein, Liang Zheng, M. Chiang, E. Elmroth, D. Wentzlaff
{"title":"Incentivizing self-capping to increase cloud utilization","authors":"Mohammad Shahrad, C. Klein, Liang Zheng, M. Chiang, E. Elmroth, D. Wentzlaff","doi":"10.1145/3127479.3128611","DOIUrl":"https://doi.org/10.1145/3127479.3128611","url":null,"abstract":"Cloud Infrastructure as a Service (IaaS) providers continually seek higher resource utilization to better amortize capital costs. Higher utilization not only can enable higher profit for IaaS providers but also provides a mechanism to raise energy efficiency; therefore creating greener cloud services. Unfortunately, achieving high utilization is difficult mainly due to infrastructure providers needing to maintain spare capacity to service demand fluctuations. Graceful degradation is a self-adaptation technique originally designed for constructing robust services that survive resource shortages. Previous work has shown that graceful degradation can also be used to improve resource utilization in the cloud by absorbing demand fluctuations and reducing spare capacity. In this work, we build a system and pricing model that enables infrastructure providers to incentivize their tenants to use graceful degradation. By using graceful degradation with an appropriate pricing model, the infrastructure provider can realize higher resource utilization while simultaneously, its tenants can increase their profit. Our proposed solution is based on a hybrid model which guarantees both reserved and peak on-demand capacities over flexible periods. It also includes a global dynamic price pair for capacity which remains uniform during each tenant's Service Level Agreement (SLA) term. We evaluate our scheme using simulations based on real-world traces and also implement a prototype using RUBiS on the Xen hypervisor as an end-to-end demonstration. Our analysis shows that the proposed scheme never hurts a tenant's net profit, but can improve it by as much as 93%. Simultaneously, it can also improve the effective utilization of contracts from 42% to as high as 99%.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78035924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}