{"title":"QuickCheck: using speculation to reduce the overhead of checks in NVM frameworks","authors":"Thomas Shull, Jian Huang, J. Torrellas","doi":"10.1145/3313808.3313822","DOIUrl":"https://doi.org/10.1145/3313808.3313822","url":null,"abstract":"Byte addressable, Non-Volatile Memory (NVM) is emerging as a revolutionary technology that provides near-DRAM performance and scalable memory capacity. To facilitate the usability of NVM, new programming frameworks have been proposed to automatically or semi-automatically maintain crash-consistent data structures, relieving much of the burden of developing persistent applications from programmers. While these new frameworks greatly improve programmer productivity, they also require many runtime checks for correct execution on persistent objects, which significantly affect the application performance. With a characterization study of various workloads, we find that the overhead of these persistence checks in these programmer-friendly NVM frameworks can be substantial and reach up to 214%. Furthermore, we find that programs nearly always access exclusively either a persistent or a non-persistent object at a given site, making the behavior of these checks highly predictable. In this paper, we propose QuickCheck, a technique that biases persistence checks based on their expected behavior, and exploits speculative optimizations to further reduce the overheads of these persistence checks. We evaluate QuickCheck with a variety of data intensive applications such as a key-value store. Our experiments show that QuickCheck improves the performance of a persistent Java framework on average by 48.2% for applications that do not require data persistence, and by 8.0% for a persistent memcached implementation running YCSB.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129678041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Kristien, T. Spink, Harry Wagstaff, Björn Franke, Igor Böhm, N. Topham
{"title":"Mitigating JIT compilation latency in virtual execution environments","authors":"Martin Kristien, T. Spink, Harry Wagstaff, Björn Franke, Igor Böhm, N. Topham","doi":"10.1145/3313808.3313818","DOIUrl":"https://doi.org/10.1145/3313808.3313818","url":null,"abstract":"Many Virtual Execution Environments (VEEs) rely on Just-in-time (JIT) compilation technology for code generation at runtime, e.g. in Dynamic Binary Translation (DBT) systems or language Virtual Machines (VMs). While JIT compilation improves native execution performance as opposed to e.g. interpretive execution, the JIT compilation process itself introduces latency. In fact, for highly optimizing JIT compilers or compilers not specifically designed for JIT compilation, e.g. LLVM, this latency can cause a substantial overhead. While existing work has introduced asynchronously decoupled JIT compilation task farms to hide this JIT compilation latency, we show that this on its own is not sufficient to mitigate the impact of JIT compilation latency on overall performance. In this paper, we introduce a novel JIT compilation scheduling policy, which performs continuous low-cost profiling of code regions already dispatched for JIT compilation, right up to the point where compilation commences. We have integrated our novel JIT compilation scheduling approach into a commercial LLVM-based DBT system and demonstrate speedups of 1.32x on average, and up to 2.31x, over its state-of-the-art concurrent task-farm based JIT compilation scheme across the SPEC CPU2006 and BioPerf benchmark suites.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115684833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Liu, Haoliang Wang, An Wang, Mengbai Xiao, Yue Cheng, Songqing Chen
{"title":"vCPU as a container: towards accurate CPU allocation for VMs","authors":"Li Liu, Haoliang Wang, An Wang, Mengbai Xiao, Yue Cheng, Songqing Chen","doi":"10.1145/3313808.3313814","DOIUrl":"https://doi.org/10.1145/3313808.3313814","url":null,"abstract":"With our increasing reliance on cloud computing, accurate resource allocation of virtual machines (or domains) in the cloud have become more and more important. However, the current design of hypervisors (or virtual machine monitors) fails to accurately allocate resources to the domains in the virtualized environment. In this paper, we claim the root cause is that the protection scope is erroneously used as the resource scope for a domain in the current virtualization design. Such design flaw prevents the hypervisor from accurately accounting resource consumption of each domain. In this paper, using virtual CPUs as a container we propose to redefine the resource scope of a domain, so that the new resource scope is aligned with all the CPU consumption incurred by this domain. As a demonstration, we implement a novel system, called VASE (vCPU as a container), on top of the Xen hypervisor. Evaluations on our testbed have shown our proposed approach is effective in accounting system-wide CPU consumption incurred by domains, while introducing negligible overhead to the system.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128802670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongyang Wang, Binzhang Fu, Gang Lu, Kun Tan, Bei Hua
{"title":"vSocket: virtual socket interface for RDMA in public clouds","authors":"Dongyang Wang, Binzhang Fu, Gang Lu, Kun Tan, Bei Hua","doi":"10.1145/3313808.3313813","DOIUrl":"https://doi.org/10.1145/3313808.3313813","url":null,"abstract":"RDMA has been widely adopted as a promising solution for high performance networks, but is still unavailable for a large number of socket-based applications running in public clouds due to the following reasons. There is no available virtualization technique of RDMA that can meet the cloud's requirements. Moreover, it is cost prohibitive to rewrite the socket-based applications with the Verbs API. To address the above problems, we present vSocket, a software-based RDMA virtualization framework for socket-based applications in public clouds. vSocket takes into account the demands of clouds such as security rules and network isolation, so it can be deployed in the current public clouds. Furthermore, vSocket provides native socket API so that socket-based applications can use it without any modifications. Finally, to validate the performance gains, we implemented a prototype and compared it with current virtual network solutions against 1) basic network benchmarks and 2) the Redis, a typical I/O intensive application. Experimental results show that the latency of basic benchmarks can be reduced by 88% and the throughput of Redis is improved by 4 times.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117146061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ScissorGC: scalable and efficient compaction for Java full garbage collection","authors":"Haoyu Li, Mingyu Wu, B. Zang, Haibo Chen","doi":"10.1145/3313808.3313820","DOIUrl":"https://doi.org/10.1145/3313808.3313820","url":null,"abstract":"Java runtime frees applications from manual memory management through automatic garbage collection (GC). This, however, is usually at the cost of stop-the-world pauses. State-of-the-art collectors leverage multiple generations, which will inevitably suffer from a full GC phase scanning and compacting the whole heap. This induces a pause tens of times longer than normal collections, which largely affects both throughput and latency of applications. In this paper, we comprehensively analyze the full GC performance of the Parallel Scavenge garbage collector in HotSpot. We find that chain-like dependencies among heap regions cause low thread utilization and poor scalability. Furthermore, many heap regions are filled with live objects (referred to as dense regions), which are unnecessary to collect. To address these two problems, we provide , which contains two main optimizations: dynamically allocating shadow regions as compaction destinations to eliminate region dependencies and skipping dense regions to reduce GC workload. Evaluation results against the HotSpot JVM of OpenJDK 8/11 show that works on most benchmarks and leads to 5.6X/5.1X improvement at best in full GC throughput and thereby boost the application performance by up to 61.8%/49.0%.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132670532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tail latency in node.js: energy efficient turbo boosting for long latency requests in event-driven web services","authors":"Wenzhi Cui, Daniel Richins, Yuhao Zhu, V. Reddi","doi":"10.1145/3313808.3313823","DOIUrl":"https://doi.org/10.1145/3313808.3313823","url":null,"abstract":"Cloud-based Web services are shifting to the event-driven, scripting language-based programming model to achieve productivity, flexibility, and scalability. Implementations of this model, however, generally suffer from long tail latencies, which we measure using Node.js as a case study. Unlike in traditional thread-based systems, reducing long tails is difficult in event-driven systems due to their inherent asynchronous programming model. We propose a framework to identify and optimize tail latency sources in scripted event-driven Web services. We introduce profiling that allows us to gain deep insights into not only how asynchronous event-driven execution impacts application tail latency but also how the managed runtime system overhead exacerbates the tail latency issue further. Using the profiling framework, we propose an event-driven execution runtime design that orchestrates the hardware’s boosting capabilities to reduce tail latency. We achieve higher tail latency reductions with lower energy overhead than prior techniques that are unaware of the underlying event-driven program execution model. The lessons we derive from Node.js apply to other event-driven services based on scripting language frameworks.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131117045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","authors":"E. Petrank, D. Lea","doi":"10.1145/3313808","DOIUrl":"https://doi.org/10.1145/3313808","url":null,"abstract":"It is our pleasure to welcome you to the 7th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE'11). \u0000 \u0000As the leading conference for presentation of research results on all aspects of virtualization, VEE brings together researchers representing a diverse set of interests. This year, we received 84 abstracts, 68 full submissions, and selected 20 papers for presentation at the conference. In selecting papers, the program committee placed high priority on work that is broadly informative and applicable to both researchers and practitioners. We are confident these papers will make for an interesting conference and a valuable contribution to the study and practice of virtualization. Additionally, the program includes a keynote presentation by David Bacon on virtualizing new forms of devices such as FPGAs. \u0000 \u0000VEE'11 is again co-located with the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Our authors, program committee, sponsors, and supporters all span the boundaries between operating systems and programming language implementation, and reflect equally strong academic and industrial interests in the field.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124029948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}