{"title":"Prefetching mobile ads: can advertising systems afford it?","authors":"Prashanth Mohan, Suman Nath, Oriana Riva","doi":"10.1145/2465351.2465378","DOIUrl":"https://doi.org/10.1145/2465351.2465378","url":null,"abstract":"Mobile app marketplaces are dominated by free apps that rely on advertising for their revenue. These apps place increased demands on the already limited battery lifetime of modern phones. For example, in the top 15 free Windows Phone apps, we found in-app advertising contributes to 65% of the app's total communication energy (or 23% of the app's total energy). Despite their small size, downloading ads each time an app is started and at regular refresh intervals forces the network radio to be continuously woken up, thus leading to a high energy overhead, so-called 'tail energy' problem. A straightforward mechanism to lower this overhead is to prefetch ads in bulk and serve them locally. However, the prefetching of ads is at odds with the real-time nature of modern advertising systems wherein ads are sold through real-time auctions each time the client can display an ad.\u0000 This paper addresses the challenge of supporting ad prefetching with minimal changes to the existing advertising architecture. We build client models predicting how many ad slots are likely to be available in the future. Based on this (unreliable) estimate, ad servers make client ad slots available in the ad exchange auctions even before they can be displayed. In order to display the ads within a short deadline, ads are probabilistically replicated across clients, using an overbooking model designed to ensure that ads are shown before their deadline expires (SLA violation rate) and are shown no more than required (revenue loss). With traces of over 1,700 iPhone and Windows Phone users, we show that our approach can reduce the ad energy overhead by over 50% with a negligible revenue loss and SLA violation rate.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"73 1","pages":"267-280"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86359866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zuhair Khayyat, Karim Awara, Amani Alonazi, H. Jamjoom, Dan Williams, Panos Kalnis
{"title":"Mizan: a system for dynamic load balancing in large-scale graph processing","authors":"Zuhair Khayyat, Karim Awara, Amani Alonazi, H. Jamjoom, Dan Williams, Panos Kalnis","doi":"10.1145/2465351.2465369","DOIUrl":"https://doi.org/10.1145/2465351.2465369","url":null,"abstract":"Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing end-to-end computation. Especially where data is very large or the runtime behavior of the algorithm is unknown, an adaptive approach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementations of Pregel, Mizan does not assume any a priori knowledge of the structure of the graph or behavior of the algorithm. Instead, it monitors the runtime characteristics of the system. Mizan then performs efficient fine-grained vertex migration to balance computation and communication. We have fully implemented Mizan; using extensive evaluation we show that---especially for highly-dynamic workloads---Mizan provides up to 84% improvement over techniques leveraging static graph pre-partitioning.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"90 1","pages":"169-182"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81548309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimus: a dynamic rewriting framework for data-parallel execution plans","authors":"Qifa Ke, M. Isard, Yuan Yu","doi":"10.1145/2465351.2465354","DOIUrl":"https://doi.org/10.1145/2465351.2465354","url":null,"abstract":"In distributed data-parallel computing, a user program is compiled into an execution plan graph (EPG), typically a directed acyclic graph. This EPG is the core data structure used by modern distributed execution engines for task distribution, job management, and fault tolerance. Once submitted for execution, the EPG remains largely unchanged at runtime except for some limited modifications. This makes it difficult to employ dynamic optimization techniques that could substantially improve the distributed execution based on runtime information.\u0000 This paper presents Optimus, a framework for dynamically rewriting an EPG at runtime. Optimus extends dynamic rewrite mechanisms present in systems such as Dryad and CIEL by integrating rewrite policy with a high-level data-parallel language, in this case DryadLINQ. This integration enables optimizations that require knowledge of the semantics of the computation, such as language customizations for domain-specific computations including matrix algebra. We describe the design and implementation of Optimus, outline its interfaces, and detail a number of rewriting techniques that address problems arising in distributed execution including data skew, dynamic data re-partitioning, handling unbounded iterative computations, and protecting important intermediate data for fault tolerance. We evaluate Optimus with real applications and data and show significant performance gains compared to manual optimization or customized systems. We demonstrate the versatility of dynamic EPG rewriting for data-parallel computing, and argue that it is an essential feature of any general-purpose distributed dataflow execution engine.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"21 1","pages":"15-28"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73444475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myeongjae Jeon, Yuxiong He, S. Elnikety, A. Cox, S. Rixner
{"title":"Adaptive parallelism for web search","authors":"Myeongjae Jeon, Yuxiong He, S. Elnikety, A. Cox, S. Rixner","doi":"10.1145/2465351.2465367","DOIUrl":"https://doi.org/10.1145/2465351.2465367","url":null,"abstract":"A web search query made to Microsoft Bing is currently parallelized by distributing the query processing across many servers. Within each of these servers, the query is, however, processed sequentially. Although each server may be processing multiple queries concurrently, with modern multicore servers, parallelizing the processing of an individual query within the server may nonetheless improve the user's experience by reducing the response time. In this paper, we describe the issues that make the parallelization of an individual query within a server challenging, and we present a parallelization approach that effectively addresses these challenges. Since each server may be processing multiple queries concurrently, we also present a adaptive resource management algorithm that chooses the degree of parallelism at run-time for each query, taking into account system load and parallelization efficiency. As a result, the servers now execute queries with a high degree of parallelism at low loads, gracefully reduce the degree of parallelism with increased load, and choose sequential execution under high load. We have implemented our parallelization approach and adaptive resource management algorithm in Bing servers and evaluated them experimentally with production workloads. The experimental results show that the mean and 95th-percentile response times for queries are reduced by more than 50% under light or moderate load. Moreover, under high load where parallelization adversely degrades the system performance, the response times are kept the same as when queries are executed sequentially. In all cases, we observe no degradation in the relevance of the search results.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"28 1","pages":"155-168"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83448893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Choosy: max-min fair sharing for datacenter jobs with constraints","authors":"A. Ghodsi, M. Zaharia, S. Shenker, I. Stoica","doi":"10.1145/2465351.2465387","DOIUrl":"https://doi.org/10.1145/2465351.2465387","url":null,"abstract":"Max-Min Fairness is a flexible resource allocation mechanism used in most datacenter schedulers. However, an increasing number of jobs have hard placement constraints, restricting the machines they can run on due to special hardware or software requirements. It is unclear how to define, and achieve, max-min fairness in the presence of such constraints. We propose Constrained Max-Min Fairness (CMMF), an extension to max-min fairness that supports placement constraints, and show that it is the only policy satisfying an important property that incentivizes users to pool resources. Optimally computing CMMF is challenging, but we show that a remarkably simple online scheduler, called Choosy, approximates the optimal scheduler well. Through experiments, analysis, and simulations, we show that Choosy on average differs 2% from the optimal CMMF allocation, and lets jobs achieve their fair share quickly.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"30 1","pages":"365-378"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75785185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"hClock: hierarchical QoS for packet scheduling in a hypervisor","authors":"J. Billaud, Ajay Gulati","doi":"10.1145/2465351.2465382","DOIUrl":"https://doi.org/10.1145/2465351.2465382","url":null,"abstract":"Higher consolidation ratios in virtualized datacenters, SSD based storage arrays and converged IO fabric are mandating the need for more flexible and powerful models for physical network adapter bandwidth allocation. Existing solutions such as fair-queuing mechanisms, traffic-shapers and hierarchical allocation techniques are insufficient in terms of dealing with various use cases for a cloud service provider.\u0000 In this paper, we present the design and implementation of a hierarchical bandwidth allocation algorithm called hClock. hClock supports three controls in a hierarchical manner: (1) minimum bandwidth guarantee, (2) rate limit and (3) shares (a.k.a weight). Our prototype implementation in VMware ESX server hypervisor, built with several optimizations to minimize CPU overhead and increase parallelism, shows that hClock is able to enforce hierarchical controls for a variety of workloads with diverse traffic patterns at scale.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"40 1","pages":"309-322"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80818092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Salomie, G. Alonso, Timothy Roscoe, Kevin Elphinstone
{"title":"Application level ballooning for efficient server consolidation","authors":"T. Salomie, G. Alonso, Timothy Roscoe, Kevin Elphinstone","doi":"10.1145/2465351.2465384","DOIUrl":"https://doi.org/10.1145/2465351.2465384","url":null,"abstract":"Systems software like databases and language runtimes typically manage memory themselves to exploit application knowledge unavailable to the OS. Traditionally deployed on dedicated machines, they are designed to be statically configured with memory sufficient for peak load. In virtualization scenarios (cloud computing, server consolidation), however, static peak provisioning of RAM to applications dramatically reduces the efficiency and cost-saving benefits of virtualization. Unfortunately, existing memory \"ballooning\" techniques used to dynamically reallocate physical memory between VMs badly impact the performance of applications which manage their own memory. We address this problem by extending ballooning to applications (here, a database engine and Java runtime) so that memory can be efficiently and effectively moved between virtualized instances as the demands of each change over time. The results are significantly lower memory requirements to provide the same performance guarantees to a collocated set of VM running such applications, with minimal overhead or intrusive changes to application code.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"862 1","pages":"337-350"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81235897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, R. Schreiber
{"title":"Presto: distributed machine learning and graph processing with sparse matrices","authors":"S. Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, R. Schreiber","doi":"10.1145/2465351.2465371","DOIUrl":"https://doi.org/10.1145/2465351.2465371","url":null,"abstract":"It is cumbersome to write machine learning and graph algorithms in data-parallel models such as MapReduce and Dryad. We observe that these algorithms are based on matrix computations and, hence, are inefficient to implement with the restrictive programming and communication interface of such frameworks.\u0000 In this paper we show that array-based languages such as R [3] are suitable for implementing complex algorithms and can outperform current data parallel solutions. Since R is single-threaded and does not scale to large datasets, we have built Presto, a distributed system that extends R and addresses many of its limitations. Presto efficiently shares sparse structured data, can leverage multi-cores, and dynamically partitions data to mitigate load imbalance. Our results show the promise of this approach: many important machine learning and graph algorithms can be expressed in a single framework and are substantially faster than those in Hadoop and Spark.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"57 2 1","pages":"197-210"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91127705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maygh: building a CDN from client web browsers","authors":"L. Zhang, Fangfei Zhou, A. Mislove, Ravi Sundaram","doi":"10.1145/2465351.2465379","DOIUrl":"https://doi.org/10.1145/2465351.2465379","url":null,"abstract":"Over the past two decades, the web has provided dramatic improvements in the ease of sharing content. Unfortunately, the costs of distributing this content are largely incurred by web site operators; popular web sites are required to make substantial monetary investments in serving infrastructure or cloud computing resources---or must pay other organizations (e.g., content distribution networks)---to help serve content. Previous approaches to offloading some of the distribution costs onto end users have relied on client-side software or web browser plug-ins, providing poor user incentives and dramatically limiting their scope in practice.\u0000 In this paper, we present Maygh, a system that builds a content distribution network from client web browsers, without the need for additional plug-ins or client-side software. The result is an organically scalable system that distributes the cost of serving web content across the users of a web site. Through simulations based on real-world access logs from Etsy (a large e-commerce web site that is the 50th most popular web site in the U.S.), microbenchmarks, and a small-scale deployment, we demonstrate that Maygh provides substantial savings to site operators, imposes only modest costs on clients, and can be deployed on the web sites and browsers of today. In fact, if Maygh was deployed to Etsy, it would reduce network bandwidth due to static content by 75% and require only a single coordinating server.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"25 1","pages":"281-294"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90073477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Process firewalls: protecting processes during resource access","authors":"H. Vijayakumar, Joshua Schiffman, T. Jaeger","doi":"10.1145/2465351.2465358","DOIUrl":"https://doi.org/10.1145/2465351.2465358","url":null,"abstract":"Processes retrieve a variety of resources from the operating system in order to execute properly, but adversaries have several ways to trick processes into retrieving resources of the adversaries' choosing. Such resource access attacks use name resolution, race conditions, and/or ambiguities regarding which resources are controlled by adversaries, accounting for 5-10% of CVE entries over the last four years. programmers have found these attacks extremely hard to eliminate because resources are managed externally to the program, but the operating system does not provide a sufficiently rich system-call API to enable programs to block such attacks. In this paper, we present the Process Firewall, a kernel mechanism that protects processes in manner akin to a network firewall for the system-call interface. Because the Process Firewall only protects processes -- rather than sandboxing them -- it can examine their internal state to identify the protection rules necessary to block many of these attacks without the need for program modification or user configuration. We built a prototype Process Firewall for Linux demonstrating: (1) the prevention of several vulnerabilities, including two that were previously-unknown; (2) that this defense can be provided system-wide for less than 4% overhead in a variety of macrobenchmarks; and (3) that it can also improve program performance, shown by Apache handling 3-8% more requests when program resource access checks are replaced by Process Firewall rules. These results show that it is practical for the operating system to protect processes by preventing a variety of resource access attacks system-wide.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"134 1","pages":"57-70"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80168891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}