Di Wang, Sriram Govindan, A. Sivasubramaniam, A. Kansal, Jie Liu, Badriddine M. Khessib
{"title":"Underprovisioning backup power infrastructure for datacenters","authors":"Di Wang, Sriram Govindan, A. Sivasubramaniam, A. Kansal, Jie Liu, Badriddine M. Khessib","doi":"10.1145/2541940.2541966","DOIUrl":"https://doi.org/10.1145/2541940.2541966","url":null,"abstract":"While there has been prior work to underprovision the power distribution infrastructure for a datacenter to save costs, the ability to underprovision the backup power infrastructure, which contributes significantly to capital costs, is little explored. There are two main components in the backup infrastructure - Diesel Generators (DGs) and UPS units - which can both be underprovisioned (or even removed) in terms of their power and/or energy capacities. However, embarking on such underprovisioning mandates studying several ramifications - the resulting cost savings, the lower availability, and the performance and state loss consequences on individual applications - concurrently. This paper presents the first such study, considering cost, availability, performance and application consequences of underprovisioning the backup power infrastructure. We present a framework to quantify the cost of backup capacity that is provisioned, and implement techniques leveraging existing software and hardware mechanisms to provide as seamless an operation as possible for an application within the provisioned backup capacity during a power outage. We evaluate the cost-performance-availability trade-offs for different levels of backup underprovisioning for applications with diverse reliance on the backup infrastructure. Our results show that one may be able to completely do away with DGs, compensating for it with additional UPS energy capacities, to significantly cut costs and still be able to handle power outages lasting as high as 40 minutes (which constitute bulk of the outages). Further, we can push the limits of outage duration that can be handled in a cost-effective manner, if applications are willing to tolerate degraded performance during the outage. Our evaluations also show that different applications react differently to the outage handling mechanisms, and that the efficacy of the mechanisms is sensitive to the outage duration. The insights from this paper can spur new opportunities for future work on backup power infrastructure optimization.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130077889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Parallelism I","authors":"J. Torrellas","doi":"10.1145/3260930","DOIUrl":"https://doi.org/10.1145/3260930","url":null,"abstract":"","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126475608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deterministic galois: on-demand, portable and parameterless","authors":"Donald Nguyen, Andrew Lenharth, K. Pingali","doi":"10.1145/2541940.2541964","DOIUrl":"https://doi.org/10.1145/2541940.2541964","url":null,"abstract":"Non-determinism in program execution can make program development and debugging difficult. In this paper, we argue that solutions to this problem should be on-demand, portable and parameterless. On-demand means that the programming model should permit the writing of non-deterministic programs since these programs often perform better than deterministic ones for the same problem. Portable means that the program should produce the same answer even if it is run on different machines. Parameterless means that if there are machine-dependent scheduling parameters that must be tuned for good performance, they must not affect the output. Although many solutions for deterministic program execution have been proposed in the literature, they fall short along one or more of these dimensions. To remedy this, we propose a new approach, based on the Galois programming model, in which (i) the programming model permits the writing of non-deterministic programs and (ii) the runtime system executes these programs deterministically if needed. Evaluation of this approach on a collection of benchmarks from the PARSEC, PBBS, and Lonestar suites shows that it delivers deterministic execution with substantially less overhead than other systems in the literature.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126579117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ubik: efficient cache sharing with strict qos for latency-critical workloads","authors":"H. Kasture, Daniel Sánchez","doi":"10.1145/2541940.2541944","DOIUrl":"https://doi.org/10.1145/2541940.2541944","url":null,"abstract":"Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"51 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126072403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Approximate computing","authors":"L. Ceze","doi":"10.1145/3260921","DOIUrl":"https://doi.org/10.1145/3260921","url":null,"abstract":"","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121699833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding trojan message vulnerabilities in distributed systems","authors":"Radu Banabic, George Candea, R. Guerraoui","doi":"10.1145/2541940.2541984","DOIUrl":"https://doi.org/10.1145/2541940.2541984","url":null,"abstract":"Trojan messages are messages that seem correct to the receiver but cannot be generated by any correct sender. Such messages constitute major vulnerability points of a distributed system---they constitute ideal targets for a malicious actor and facilitate failure propagation across nodes. We describe Achilles, a tool that searches for Trojan messages in a distributed system. Achilles uses dynamic white-box analysis on the distributed system binaries in order to infer the predicate that defines messages parsed by receiver nodes and generated by sender nodes, respectively, and then computes Trojan messages as the difference between the two. We apply Achilles on implementations of real distributed systems: FSP, a file transfer application, and PBFT, a Byzantine-fault-tolerant state machine replication library. Achilles discovered a new bug in FSP and rediscovered a previously known vulnerability in PBFT. In our evaluation we demonstrate that our approach can perform orders of magnitude better than general approaches based on regular fuzzing and symbolic execution.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131572146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual ghost: protecting applications from hostile operating systems","authors":"J. Criswell, Nathan Dautenhahn, Vikram S. Adve","doi":"10.1145/2541940.2541986","DOIUrl":"https://doi.org/10.1145/2541940.2541986","url":null,"abstract":"Applications that process sensitive data can be carefully designed and validated to be difficult to attack, but they are usually run on monolithic, commodity operating systems, which may be less secure. An OS compromise gives the attacker complete access to all of an application's data, regardless of how well the application is built. We propose a new system, Virtual Ghost, that protects applications from a compromised or even hostile OS. Virtual Ghost is the first system to do so by combining compiler instrumentation and run-time checks on operating system code, which it uses to create ghost memory that the operating system cannot read or write. Virtual Ghost interposes a thin hardware abstraction layer between the kernel and the hardware that provides a set of operations that the kernel must use to manipulate hardware, and provides a few trusted services for secure applications such as ghost memory management, encryption and signing services, and key management. Unlike previous solutions, Virtual Ghost does not use a higher privilege level than the kernel. Virtual Ghost performs well compared to previous approaches; it outperforms InkTag on five out of seven of the LMBench microbenchmarks with improvements between 1.3x and 14.3x. For network downloads, Virtual Ghost experiences a 45% reduction in bandwidth at most for small files and nearly no reduction in bandwidth for large files and web traffic. An application we modified to use ghost memory shows a maximum additional overhead of 5% due to the Virtual Ghost protections. We also demonstrate Virtual Ghost's efficacy by showing how it defeats sophisticated rootkit attacks.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"361 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133817873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eric M. Schulte, Jonathan Dorn, Stephen Harding, S. Forrest, Westley Weimer
{"title":"Post-compiler software optimization for reducing energy","authors":"Eric M. Schulte, Jonathan Dorn, Stephen Harding, S. Forrest, Westley Weimer","doi":"10.1145/2541940.2541980","DOIUrl":"https://doi.org/10.1145/2541940.2541980","url":null,"abstract":"Modern compilers typically optimize for executable size and speed, rarely exploring non-functional properties such as power efficiency. These properties are often hardware-specific, time-intensive to optimize, and may not be amenable to standard dataflow optimizations. We present a general post-compilation approach called Genetic Optimization Algorithm (GOA), which targets measurable non-functional aspects of software execution in programs that compile to x86 assembly. GOA combines insights from profile-guided optimization, superoptimization, evolutionary computation and mutational robustness. GOA searches for program variants that retain required functional behavior while improving non-functional behavior, using characteristic workloads and predictive modeling to guide the search. The resulting optimizations are validated using physical performance measurements and a larger held-out test suite. Our experimental results on PARSEC benchmark programs show average energy reductions of 20%, both for a large AMD system and a small Intel system, while maintaining program functionality on target workloads.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128256313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ouyang Jian, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, Yuanzheng Wang
{"title":"SDF: software-defined flash for web-scale internet storage systems","authors":"Ouyang Jian, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, Yuanzheng Wang","doi":"10.1145/2541940.2541959","DOIUrl":"https://doi.org/10.1145/2541940.2541959","url":null,"abstract":"In the last several years hundreds of thousands of SSDs have been deployed in the data centers of Baidu, China's largest Internet search company. Currently only 40% or less of the raw bandwidth of the flash memory in the SSDs is delivered by the storage system to the applications. Moreover, because of space over-provisioning in the SSD to accommodate non-sequential or random writes, and additionally, parity coding across flash channels, typically only 50-70% of the raw capacity of a commodity SSD can be used for user data. Given the large scale of Baidu's data center, making the most effective use of its SSDs is of great importance. Specifically, we seek to maximize both bandwidth and usable capacity. To achieve this goal we propose {em software-defined flash} (SDF), a hardware/software co-designed storage system to maximally exploit the performance characteristics of flash memory in the context of our workloads. SDF exposes individual flash channels to the host software and eliminates space over-provisioning. The host software, given direct access to the raw flash channels of the SSD, can effectively organize its data and schedule its data access to better realize the SSD's raw performance potential. Currently more than 3000 SDFs have been deployed in Baidu's storage system that supports its web page and image repository services. Our measurements show that SDF can deliver approximately 95% of the raw flash bandwidth and provide 99% of the flash capacity for user data. SDF increases I/O bandwidth by 300% and reduces per-GB hardware cost by 50% on average compared with the commodity-SSD-based system used at Baidu.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115978840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandeep R. Agrawal, Valentin Pistol, Jun Pang, J. Tran, D. Tarjan, A. Lebeck
{"title":"Rhythm: harnessing data parallel hardware for server workloads","authors":"Sandeep R. Agrawal, Valentin Pistol, Jun Pang, J. Tran, D. Tarjan, A. Lebeck","doi":"10.1145/2541940.2541956","DOIUrl":"https://doi.org/10.1145/2541940.2541956","url":null,"abstract":"Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on the data center as a whole, using off-the-shelf hardware for individual servers. Server capacity is typically increased by adding more machines, which is cheap, though inefficient in the long run in terms of energy and area. Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. We present a framework---called Rhythm---for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluating both software and hardware for future cohort-based servers. Our evaluation of Rhythm on future server platforms shows that it achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9. A Rhythm implementation that generates transposed responses achieves 8x the i7 throughput while processing 2.5x more requests/Joule compared to the A9.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116384488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}