{"title":"CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters","authors":"Íñigo Goiri, Thu D. Nguyen, R. Bianchini","doi":"10.1145/2694344.2694378","DOIUrl":"https://doi.org/10.1145/2694344.2694378","url":null,"abstract":"Despite its benefits, free cooling may expose servers to high absolute temperatures, wide temperature variations, and high humidity when datacenters are sited at certain locations. Prior research (in non-free-cooled datacenters) has shown that high temperatures and/or wide temporal temperature variations can harm hardware reliability. In this paper, we identify the runtime management strategies required to limit absolute temperatures, temperature variations, humidity, and cooling energy in free-cooled datacenters. As the basis for our study, we propose CoolAir, a system that embodies these strategies. Using CoolAir and a real free-cooled datacenter prototype, we show that effective management requires cooling infrastructures that can act smoothly. In addition, we show that CoolAir can tightly manage temperature and significantly reduce temperature variation, often at a lower cooling cost than existing free-cooled datacenters. Perhaps most importantly, based on our results, we derive several principles and lessons that should guide the design of management systems for free-cooled datacenters of any size.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122473606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution","authors":"Yavuz Yetim, S. Malik, M. Martonosi","doi":"10.1145/2694344.2694354","DOIUrl":"https://doi.org/10.1145/2694344.2694354","url":null,"abstract":"As semiconductor technology scales towards ever-smaller transistor sizes, hardware fault rates are increasing. Since important application classes (e.g., multimedia, streaming workloads) are data-error-tolerant, recent research has proposed techniques that seek to save energy or improve yield by exploiting error tolerance at the architecture/microarchitecture level. Even seemingly error-tolerant applications, however, will crash or hang due to control-flow/memory addressing errors. In parallel computation, errors involving inter-thread communication can have equally catastrophic effects. Our work explores techniques that mitigate the impact of potentially catastrophic errors in parallel computation, while still garnering power, cost, or yield benefits from data error tolerance. Our proposed CommGuard solution uses FSM-based checkers to pad and discard data in order to maintain semantic alignment between program control flow and the data communicated between processors. CommGuard techniques are low overhead and they exploit application information already provided by some parallel programming languages (e.g. StreamIt). By converting potentially catastrophic communication errors into potentially tolerable data errors, CommGuard allows important streaming applications like JPEG and MP3 decoding to execute without crashing and to sustain good output quality, even for errors as frequent as every 500μs.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121413530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael F. Ringenburg, Adrian Sampson, Isaac Ackerman, L. Ceze, D. Grossman
{"title":"Monitoring and Debugging the Quality of Results in Approximate Programs","authors":"Michael F. Ringenburg, Adrian Sampson, Isaac Ackerman, L. Ceze, D. Grossman","doi":"10.1145/2694344.2694365","DOIUrl":"https://doi.org/10.1145/2694344.2694365","url":null,"abstract":"Energy efficiency is a key concern in the design of modern computer systems. One promising approach to energy-efficient computation, approximate computing, trades off output accuracy for significant gains in energy efficiency. However, debugging the actual cause of output quality problems in approximate programs is challenging. This paper presents dynamic techniques to debug and monitor the quality of approximate computations. We propose both offline debugging tools that instrument code to determine the key sources of output degradation and online approaches that monitor the quality of deployed applications. We present two offline debugging techniques and three online monitoring mechanisms. The first offline tool identifies correlations between output quality and the execution of individual approximate operations. The second tracks approximate operations that flow into a particular value. Our online monitoring mechanisms are complementary approaches designed for detecting quality problems in deployed applications, while still maintaining the energy savings from approximation. We present implementations of our techniques and describe their usage with seven applications. Our online monitors control output quality while still maintaining significant energy efficiency gains, and our offline tools provide new insights into the effects of approximation on output quality.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123515899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 8B: Memory Management","authors":"A. Buyuktosunoglu","doi":"10.1145/3251041","DOIUrl":"https://doi.org/10.1145/3251041","url":null,"abstract":"","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124575645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhangxi Tan, Zhenghao Qian, X. Chen, K. Asanović, D. Patterson
{"title":"DIABLO","authors":"Zhangxi Tan, Zhenghao Qian, X. Chen, K. Asanović, D. Patterson","doi":"10.1145/2786763.2694362","DOIUrl":"https://doi.org/10.1145/2786763.2694362","url":null,"abstract":"","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133784080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. E. Haque, Y. Eom, Yuxiong He, S. Elnikety, R. Bianchini, K. McKinley
{"title":"Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services","authors":"Md. E. Haque, Y. Eom, Yuxiong He, S. Elnikety, R. Bianchini, K. McKinley","doi":"10.1145/2694344.2694384","DOIUrl":"https://doi.org/10.1145/2694344.2694384","url":null,"abstract":"Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing individual requests to reduce tail latency is challenging because (1) service demand is unknown when requests arrive; (2) blindly parallelizing all requests quickly oversubscribes hardware resources; and (3) parallelizing the numerous short requests will not improve tail latency. This paper introduces Few-to-Many (FM) incremental parallelization, which dynamically increases parallelism to reduce tail latency. FM uses request service demand profiles and hardware parallelism in an offline phase to compute a policy, represented as an interval table, which specifies when and how much software parallelism to add. At runtime, FM adds parallelism as specified by the interval table indexed by dynamic system load and request execution time progress. The longer a request executes, the more parallelism FM adds. We evaluate FM in Lucene, an open-source enterprise search engine, and in Bing, a commercial Web search engine. FM improves the 99th percentile response time up to 32% in Lucene and up to 26% in Bing, compared to prior state-of-the-art parallelization. Compared to running requests sequentially in Bing, FM improves tail latency by a factor of two. These results illustrate that incremental parallelism is a powerful tool for reducing tail latency.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127233346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, Milind Kulkarni
{"title":"Hybrid Static–Dynamic Analysis for Statically Bounded Region Serializability","authors":"Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, Milind Kulkarni","doi":"10.1145/2694344.2694379","DOIUrl":"https://doi.org/10.1145/2694344.2694379","url":null,"abstract":"Data races are common. They are difficult to detect, avoid, or eliminate, and programmers sometimes introduce them intentionally. However, shared-memory programs with data races have unexpected, erroneous behaviors. Intentional and unintentional data races lead to atomicity and sequential consistency (SC) violations, and they make it more difficult to understand, test, and verify software. Existing approaches for providing stronger guarantees for racy executions add high run-time overhead and/or rely on custom hardware. This paper shows how to provide stronger semantics for racy programs while providing relatively good performance on commodity systems. A novel hybrid static--dynamic analysis called emph{EnfoRSer} provides end-to-end support for a memory model called emph{statically bounded region serializability} (SBRS) that is not only stronger than weak memory models but is strictly stronger than SC. EnfoRSer uses static compiler analysis to transform regions, and dynamic analysis to detect and resolve conflicts at run time. By demonstrating commodity support for a reasonably strong memory model with reasonable overheads, we show its potential as an always-on execution model.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123647798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang Liu, Austin Harris, Martin Maas, M. Hicks, Mohit Tiwari, E. Shi
{"title":"GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation","authors":"Chang Liu, Austin Harris, Martin Maas, M. Hicks, Mohit Tiwari, E. Shi","doi":"10.1145/2694344.2694385","DOIUrl":"https://doi.org/10.1145/2694344.2694385","url":null,"abstract":"This paper presents a new, co-designed compiler and architecture called GhostRider for supporting privacy preserving computation in the cloud. GhostRider ensures all programs satisfy a property called memory-trace obliviousness (MTO): Even an adversary that observes memory, bus traffic, and access times while the program executes can learn nothing about the program's sensitive inputs and outputs. One way to achieve MTO is to employ Oblivious RAM (ORAM), allocating all code and data in a single ORAM bank, and to also disable caches or fix the rate of memory traffic. This baseline approach can be inefficient, and so GhostRider's compiler uses a program analysis to do better, allocating data to non-oblivious, encrypted RAM (ERAM) and employing a scratchpad when doing so will not compromise MTO. The compiler can also allocate to multiple ORAM banks, which sometimes significantly reduces access times.We have formalized our approach and proved it enjoys MTO. Our FPGA-based hardware prototype and simulation results show that GhostRider significantly outperforms the baseline strategy.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116425158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikita Mishra, Huazhe Zhang, J. Lafferty, H. Hoffmann
{"title":"A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints","authors":"Nikita Mishra, Huazhe Zhang, J. Lafferty, H. Hoffmann","doi":"10.1145/2694344.2694373","DOIUrl":"https://doi.org/10.1145/2694344.2694373","url":null,"abstract":"In many deployments, computer systems are underutilized -- meaning that applications have performance requirements that demand less than full system capacity. Ideally, we would take advantage of this under-utilization by allocating system resources so that the performance requirements are met and energy is minimized. This optimization problem is complicated by the fact that the performance and power consumption of various system configurations are often application -- or even input -- dependent. Thus, practically, minimizing energy for a performance constraint requires fast, accurate estimations of application-dependent performance and power tradeoffs. This paper investigates machine learning techniques that enable energy savings by learning Pareto-optimal power and performance tradeoffs. Specifically, we propose LEO, a probabilistic graphical model-based learning system that provides accurate online estimates of an application's power and performance as a function of system configuration. We compare LEO to (1) offline learning, (2) online learning, (3) a heuristic approach, and (4) the true optimal solution. We find that LEO produces the most accurate estimates and near optimal energy savings.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122886534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dohyeong Kim, Yonghwi Kwon, Nick Sumner, X. Zhang, Dongyan Xu
{"title":"Dual Execution for On the Fly Fine Grained Execution Comparison","authors":"Dohyeong Kim, Yonghwi Kwon, Nick Sumner, X. Zhang, Dongyan Xu","doi":"10.1145/2694344.2694394","DOIUrl":"https://doi.org/10.1145/2694344.2694394","url":null,"abstract":"Execution comparison has many applications in debugging, malware analysis, software feature identification, and intrusion detection. Existing comparison techniques have various limitations. Some can only compare at the system event level and require executions to take the same input. Some require storing instruction traces that are very space-consuming and have difficulty dealing with non-determinism. In this paper, we propose a novel dual execution technique that allows on-the-fly comparison at the instruction level. Only differences between the executions are recorded. It allows executions to proceed in a coupled mode such that they share the same input sequence with the same timing, reducing nondeterminism. It also allows them to proceed in a decoupled mode such that the user can interact with each one differently. Decoupled executions can be recoupled to share the same future inputs and facilitate further comparison. We have implemented a prototype and applied it to identifying functional components for reuse, comparative debugging with new GDB primitives, and understanding real world regression failures. Our results show that dual execution is a critical enabling technique for execution comparison.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115147878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}