Philipp Wiesner, Ilja Behnke, Dominik Scheinert, Kordian Gontarska, L. Thamsen
{"title":"Let's wait awhile: how temporal workload shifting can reduce carbon emissions in the cloud","authors":"Philipp Wiesner, Ilja Behnke, Dominik Scheinert, Kordian Gontarska, L. Thamsen","doi":"10.1145/3464298.3493399","DOIUrl":"https://doi.org/10.1145/3464298.3493399","url":null,"abstract":"Depending on energy sources and demand, the carbon intensity of the public power grid fluctuates over time. Exploiting this variability is an important factor in reducing the emissions caused by data centers. However, regional differences in the availability of low-carbon energy sources make it hard to provide general best practices for when to consume electricity. Moreover, existing research in this domain focuses mostly on carbon-aware workload migration across geo-distributed data centers, or addresses demand response purely from the perspective of power grid stability and costs. In this paper, we examine the potential impact of shifting computational workloads towards times where the energy supply is expected to be less carbon-intensive. To this end, we identify characteristics of delay-tolerant workloads and analyze the potential for temporal workload shifting in Germany, Great Britain, France, and California over the year 2020. Furthermore, we experimentally evaluate two workload shifting scenarios in a simulation to investigate the influence of time constraints, scheduling strategies, and the accuracy of carbon intensity forecasts. To accelerate research in the domain of carbon-aware computing and to support the evaluation of novel scheduling algorithms, our simulation framework and datasets are publicly available.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131047394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fresh look at the architecture and performance of contemporary isolation platforms","authors":"V. V. Rijn, Jan S. Rellermeyer","doi":"10.1145/3464298.3493404","DOIUrl":"https://doi.org/10.1145/3464298.3493404","url":null,"abstract":"With the ever-increasing pervasiveness of the cloud computing paradigm, strong isolation guarantees and low performance overhead from isolation platforms are paramount. An ideal isolation platform offers both: an impermeable isolation boundary while imposing a negligible performance overhead. In this paper, we examine various isolation platforms (containers, secure containers, hypervisors, unikernels), and conduct a wide array of experiments to measure the performance overhead and degree of isolation offered by the platforms. We find that container platforms have the best, near-native, performance while the newly emerging secure containers suffer from various overheads. The highest degree of isolation is achieved by unikernels, closely followed by traditional containers.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127068366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nico Weichbrodt, Joshua Heinemann, Lennart Almstedt, Pierre-Louis Aublin, R. Kapitza
{"title":"Experience Paper: sgx-dl: dynamic loading and hot-patching for secure applications","authors":"Nico Weichbrodt, Joshua Heinemann, Lennart Almstedt, Pierre-Louis Aublin, R. Kapitza","doi":"10.1145/3464298.3476134","DOIUrl":"https://doi.org/10.1145/3464298.3476134","url":null,"abstract":"Trusted execution as offered by Intel's Software Guard Extensions (SGX) is considered as an enabler to protect the integrity and confidentiality of stateful workloads such as key-value stores and databases in untrusted environments. These systems are typically long running and require extension mechanisms built on top of dynamic loading as well as hot-patching to avoid downtimes and apply security updates faster. However, such essential mechanisms are currently neglected or even missing in combination with trusted execution. We present sgx-dl, a lean framework that enables dynamic loading of enclave code at the function level and hot-patching of dynamically loaded code. Additionally, sgx-dl is the first framework to utilize the new SGX version 2 features and also provides a versioning mechanism for dynamically loaded code. Our evaluation shows that sgx-dl introduces a performance overhead of less than 5% and shrinks application downtime by an order of magnitude in the case of a database system.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129024711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ubaid Ullah Hafeez, Xiao Sun, Anshul Gandhi, Zhenhua Liu
{"title":"Towards optimal placement and scheduling of DNN operations with Pesto","authors":"Ubaid Ullah Hafeez, Xiao Sun, Anshul Gandhi, Zhenhua Liu","doi":"10.1145/3464298.3476132","DOIUrl":"https://doi.org/10.1145/3464298.3476132","url":null,"abstract":"The increasing size of Deep Neural Networks (DNNs) has necessitated the use of multiple GPUs to host a single DNN model, a practice commonly referred to as model parallelism. The key challenge for model parallelism is to efficiently and effectively partition the DNN model across GPUs to avoid communication overheads while maximizing the GPU utilization, with the end-goal of minimizing the training time of DNN models. Existing approaches either take a long time(hours or even days) to find an effective partition or settle for sub-optimal partitioning, invariably increasing the end-to-end training effort. In this paper, we design and implement Pesto, a fast and near-optimal model placement technique for automatically partitioning arbitrary DNNs across multiple GPUs. The key idea in Pesto is to jointly optimize the model placement and scheduling at the fine-grained operation level to minimize inter-GPU communication while maximizing the opportunity to parallelize the model across GPUs. By carefully formulating the problem as an integer program, Pesto can provide the optimal placement and scheduling. We implement Pesto in TensorFlow and show that Pesto can reduce model training time by up to 31% compared to state-of-the-art approaches, across several large DNN models.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133430736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 22nd International Middleware Conference","authors":"","doi":"10.1145/3464298","DOIUrl":"https://doi.org/10.1145/3464298","url":null,"abstract":"","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127953768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FW-KV: improving read guarantees in PSI","authors":"Masoomeh Javidi Kishi, R. Palmieri","doi":"10.1145/3464298.3476131","DOIUrl":"https://doi.org/10.1145/3464298.3476131","url":null,"abstract":"We present FW-KV, a novel distributed transactional in-memory key-value store that guarantees the Parallel Snapshot Isolation (PSI) correctness level. FW-KV's primary goal is to allow its read-only transactions to access more up-to-date (fresher) versions of objects than Walter, the state-of-the-art implementation of PSI. FW-KV achieves that without assuming synchrony or a synchronized clock service. The improved level of freshness comes at no significant performance degradation, especially in low contention workloads, as assessed by our evaluation study including two standard OLTP benchmarks, YCSB and TPC-C. The performance gap between FW-KV and Walter is less than 5% in low contention scenarios, and less than 28% in high contention.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115689649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Messadi, Shivananda Neumann, Nico Weichbrodt, Lennart Almstedt, Mohammad Mahhouk, R. Kapitza
{"title":"Precursor: a fast, client-centric and trusted key-value store using RDMA and Intel SGX","authors":"I. Messadi, Shivananda Neumann, Nico Weichbrodt, Lennart Almstedt, Mohammad Mahhouk, R. Kapitza","doi":"10.1145/3464298.3476129","DOIUrl":"https://doi.org/10.1145/3464298.3476129","url":null,"abstract":"As offered by the Intel Software Guard Extensions (SGX), trusted execution enables confidentiality and integrity for off-site deployed services. Thereby, securing key-value stores has received particular attention, as they are a building block for many complex applications to speed-up request processing. Initially, the developers' main design challenge has been to address the performance barriers of SGX. Besides, we identified the integration of a SGX-secured key-value store with recent network technologies, especially RDMA, as an essential emerging requirement. RDMA allows fast direct access to remote memory at high bandwidth. As SGX-protected memory cannot be directly accessed over the network, a fast exchange between the main and trusted memory must be enabled. More importantly, SGX-protected services can be expected to be CPU-bound as a result of the vast number of cryptographic operations required to transfer and store data securely. In this paper, we present Precursor, a new key-value store design that utilizes trusted execution to offer confidentiality and integrity while relying on RDMA for low latency and high bandwidth communication. Precursor offloads cryptographic operations to the client-side to prevent a server-side CPU bottleneck and reduces data movement in and out of the trusted execution environment. Our evaluation shows that Precursor achieves up to 6--8.5 times higher throughput when compared against similar SGX-secured key-value store approaches.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126945584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guillaume Rosinosky, Simon Da Silva, Sonia Ben Mokhtar, D. Négru, Laurent Réveillère, E. Rivière
{"title":"PProx: efficient privacy for recommendation-as-a-service","authors":"Guillaume Rosinosky, Simon Da Silva, Sonia Ben Mokhtar, D. Négru, Laurent Réveillère, E. Rivière","doi":"10.1145/3464298.3476130","DOIUrl":"https://doi.org/10.1145/3464298.3476130","url":null,"abstract":"We present PProx, a system preventing recommendation-as-a-service (RaaS) providers from accessing sensitive data about the users of applications leveraging their services. PProx does not impact recommendations accuracy, is compatible with arbitrary recommendation algorithms, and has minimal deployment requirements. Its design combines two proxying layers directly running inside SGX enclaves at the RaaS provider side. These layers transparently pseudonymize users and items and hide links between the two, and PProx privacy guarantees are robust even to the corruption of one of these enclaves. We integrated PProx with Harness's Universal Recommender and evaluated it on a 27-node cluster. Our results indicate its ability to withstand a high number of requests with low end-to-end latency, horizontally scaling up to match increasing workloads of recommendations.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121892139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory at your service: fast memory allocation for latency-critical services","authors":"Aidi Pi, Junxian Zhao, Shaoqi Wang, Xiaobo Zhou","doi":"10.1145/3464298.3493394","DOIUrl":"https://doi.org/10.1145/3464298.3493394","url":null,"abstract":"Co-location and memory sharing between latency-critical services, such as key-value store and web search, and best-effort batch jobs is an appealing approach to improving memory utilization in multi-tenant datacenter systems. However, we find that the very diverse goals of job co-location and the GNU/Linux system stack can lead to severe performance degradation of latency-critical services under memory pressure in a multi-tenant system. We address memory pressure for latency-critical services via fast memory allocation and proactive reclamation. We find that memory allocation latency dominates the overall query latency, especially under memory pressure. We analyze the default memory management mechanism provided by GNU/Linux system stack and identify the reasons why it is inefficient for latency-critical services in a multi-tenant system. We present Hermes, a fast memory allocation mechanism in user space that adaptively reserves memory for latency-critical services. It advises Linux OS to proactively reclaim memory of batch jobs. We implement Hermes in GNU C Library. Experimental result shows that Hermes reduces the average and the 99th percentile memory allocation latency by up to 54.4% and 62.4% for a micro benchmark, respectively. For two real-world latency-critical services, Hermes reduces both the average and the 99th percentile tail query latency by up to 40.3%. Compared to the default Glibc, jemalloc and TCMalloc, Hermes reduces Service Level Objective violation by up to 84.3% under memory pressure.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131297957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YASMIN: a real-time middleware for COTS heterogeneous platforms","authors":"Benjamin Rouxel, S. Altmeyer, C. Grelck","doi":"10.1145/3464298.3493402","DOIUrl":"https://doi.org/10.1145/3464298.3493402","url":null,"abstract":"Commercial-off-the-shelf (COTS) heterogeneous platforms provide immense computational power, but are difficult to program and to correctly use when real-time requirements come into play: A sound configuration of the operating system scheduler is needed, and a suitable mapping of tasks to computing units must be determined. Flawed designs lead to sub-optimal system configurations and, thus, to wasted resources or even to deadline misses and system failures. We propose YASMIN, a middleware to schedule end-user applications with real-time requirements in user space and on behalf of the operating system. YASMIN combines an easy-to-use programming interface with portability across a wide range of architectures. It treats heterogeneity on COTS embedded platforms as a first-class citizen: YASMIN supports multiple functionally equivalent task implementations with distinct extra-functional behaviour. This enables the system designer to quickly explore different scheduling policies and task-to-core mappings, and thus, to improve overall system performance. In this paper, we present the design and implementation of YASMIN and provide an analysis of the scheduling overhead on an Odroid-XU4 platform. We demonstrate the merits of YASMIN on an industrial use-case involving a search-and-rescue drone.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}