V. Govindaraj, Sumitha George, M. Kandemir, J. Sampson, N. Vijaykrishnan
{"title":"PowerPrep: A power management proposal for user-facing datacenter workloads","authors":"V. Govindaraj, Sumitha George, M. Kandemir, J. Sampson, N. Vijaykrishnan","doi":"10.1109/nas51552.2021.9605364","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605364","url":null,"abstract":"Modern data center applications are user facing/latency critical. Our work analyzes the characteristics of such applications i.e., high idleness, unpredictable CPU usage, and high sensitivity to CPU performance. In spite of such execution characteristics, datacenter operators disable sleep states to optimize performance. Deep-sleep states hurt performance mainly due to: a) high wake-latency and b) cache warm-up after exiting deep-sleep. To address these challenges, we quantify three necessary characteristics required to realize deep-sleep states in datacenter applications: a) low wake-latency, b) low resident power, and c) selective retention of cache-state. Using these observations, we show how emerging technological advances can be leveraged to improve the energy efficiency of latency-critical datacenter workloads.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126596183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from Program Chairs","authors":"","doi":"10.1109/nas51552.2021.9605416","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605416","url":null,"abstract":"","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126698001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MLPP: Exploring Transfer Learning and Model Distillation for Predicting Application Performance","authors":"J. Gunasekaran, Cyan Subhra Mishra","doi":"10.1109/nas51552.2021.9605431","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605431","url":null,"abstract":"Performance prediction for applications is quintessential towards detecting malicious hardware and software vulnerabilities. Typically application performance is predicted using the profiling data generated from hardware tools such as linux perf. By leveraging the data, prediction models, both machine learning (ML) based and non ML-based have been proposed. However a majority of these models suffer from either loss in prediction accuracy, very large model sizes, and/or lack of general applicability to different hardware types such as wearables, handhelds, desktops etc. To address the aforementioned inefficiencies, in this paper we proposed MLPP, a machine learning based performance prediction model which can accurately predict application performance, and at the same time be easily transferable to a wide both mobile and desktop hardware platforms by leveraging transfer learning technique. Furthermore, MLPP incorporates model distillation techniques to significantly reduce the model size. Through our extensive experimentation and evaluation we show that MLPP can achieve up to 92.5% prediction accuracy while reducing the model size by up to 3.5 ×.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125482240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Network Delay Variability to Improve QoE of Latency Critical Services","authors":"S. Shukla, M. Farrens","doi":"10.1109/nas51552.2021.9605367","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605367","url":null,"abstract":"Even as cloud providers offer strict guarantees on the intra-cloud delay of requests for Latency-Critical (LC) Services, a high external network delay can result in a large end-to-end delay, causing a low user Quality of Experience (QoE). Furthermore, due to the variability in the external network delay, there is a disconnect between the user’s QoE and the cloud guaranteed service level objective (SLO). Specifically, a request that meets the SLO, can have a high or low QoE depending on the external network delay. In this work we propose a usercentric End-to-end Service Level Objective (ESLO), an extension of the traditional cloud-centric SLO, that guarantees stricter bounds on end-to-end delay and thereby achieving a higher QoE. We show how the variability in the external network delay can be both addressed and leveraged to meet the ESLO and improve server utilization. We propose ESLO-aware extensions to the Kubernetes infrastructure, that uses information about the external network delay and its distribution - (a) to reduce the number of QoE-violating responses by using deadline-based scheduling at the service instances, and (b) to appropriately scale service instances with load. We implement the ESLO-aware framework on the NSF Chameleon cloud testbed and present experimental results demonstrating the benefit of the proposed paradigm.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131344750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Empirical Study of File Systems on Optane Persistent Memory","authors":"Yang Yang, Qian Cao, Shengjue Wang","doi":"10.1109/nas51552.2021.9605448","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605448","url":null,"abstract":"Emerging byte-addressable Non-volatile memories (NVM) are promising techniques as memory-like storage. Researchers have developed many NVM-aware file systems to exploit the benefits of NVM. However, many early file systems are usually evaluated based on DRAM-based simulations or emulations. Their experimental results cannot present the actual behaviors upon real NVM devices, since the devices do not perform like slow DRAMs as expected. In this paper, we provide a comprehensive empirical study of NVM-aware file systems on the first commercially available byte-addressable NVM (i.e., the Intel Optane DC Persistent Memory Module (PMM)). We evaluate and analyze the performance of the kernel-level file systems (XFS-DAX, Ext4-DAX, PMFS, and NOVA) and the user-space file systems (Strata and Libnvmmio) on PMM with various synthetic and real-world benchmarks (FIO, Filebench, FXmark, Redis, etc.). We also employ different file system configurations and different PMM configurations to evaluate their performance impact. We believe that the experimental results and performance analysis will provide implications for the developers of various applications and storage systems to reap the full characteristics of NVMs.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124153555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cached Mapping Table Prefetching for Random Reads in Solid-State Drives","authors":"X. Ruan, Xunfei Jiang, Haiquan Chen","doi":"10.1109/nas51552.2021.9605397","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605397","url":null,"abstract":"Data caching strategies and Garbage Collection on SSDs have been extensively explored in the past years. However, the Mapping Table cache performance has not been well studied. Mapping table provides page translation information to Flash Translation Layer (FTL) in order to translate Logical Page Address (LPA) to Physical Page Address (PPA). Missing in mapping table cache causes extra read transactions to flash storage which results in stalls of I/O requests processing in SSDs. Random read requests are affected more than random write requests since write requests can be handled by write cache effectively. In this paper, we analyze the impact of CMT on different random read requests and present a Cached Mapping Table prefetching approach which fetches logical-to-physical page translation information in order to mitigate the stalls in processing random read requests. Our experimental results show an improvement of average request waiting time by up to 13%.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124186866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Tekreeti, T. Cao, Xiaopu Peng, T. Bhattacharya, Jianzhou Mao, X. Qin, Wei-Shinn Ku
{"title":"Towards Energy-Efficient and Real-Time Cloud Computing","authors":"T. Tekreeti, T. Cao, Xiaopu Peng, T. Bhattacharya, Jianzhou Mao, X. Qin, Wei-Shinn Ku","doi":"10.1109/nas51552.2021.9605453","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605453","url":null,"abstract":"In modern cloud computing environments, there is a tremendous growth of data to be stored and managed in data centers. Large-scale data centers demand high utilization of computing and storage resources, which lead to expensive operational cost for energy usage. Evidence shows that consolidating virtual machines (VMs) can conserve energy consumption in clouds through VM migrations. VM-consolidation techniques, however, inevitably induce a burden on performance. To address this issue, we propose a holistic solution - EGRET - to boost energy efficiency of cloud computing platforms by seamlessly integrating the DVFS scheme with the VM-consolidation technique. EGRET dynamically determines the most energy-efficient strategy by issuing a command to either scale CPU frequencies on a VM or marking the VM as underutilized. We conduct extensive experiments to evaluate the performance of EGRET. The experimental results show that EGRET substantially improves the energy efficiency of cloud computing platforms.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient NVM Crash Consistency by Mitigating Resource Contention","authors":"Zhiyuan Lu, Jianhui Yue, Yifu Deng, Yifeng Zhu","doi":"10.1109/nas51552.2021.9605429","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605429","url":null,"abstract":"Logging is widely adopted to ensure crash consistency for Non-Volatile Memory (NVM) systems. However, the logging imposes significant performance overhead caused by the extra log operations and ordering constraints between the logging and in-place updates, degrading the system performance. There are some research efforts to reduce the logging overhead. Recently, LAD proposed that exploiting the non-volatility of Asynchronous DRAM Refresh (ADR) buffer can remove log operations for a transaction whose total amount of updated cachelines is smaller than the buffer capacity, ensuring crash consistency. However, on multi-core systems, concurrent transactions contend the scarce ADR buffer and frequently lead to the buffer overflow. Upon the buffer overflow, LAD resorts to logging operations for in-flight transactions, degrading the system performance. Our experiments show that LAD produces a significant number of log operations when multiple transactions run concurrently. To decrease log operations caused by LAD, this paper presents a new transaction execution scheme, called two-stage transaction execution(TSTE), which allows the write requests of a transaction to be in both the ADR buffer and the staging SRAM buffer. Our new scheme performs log operations for a transaction’s write requests in the SRAM buffer and executes in-place update operations for this transaction’s write requests in the ADR buffer. The introduced SRAM buffer can make the ADR buffer serve more update requests, reducing log operations.The evaluation results demonstrate that our proposed schemes can efficiently reduce log operations up to 39.29% and improve the transaction throughput up to 28.22%","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"39 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115021274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Egress Engineering over BGP Label Unicast in MPLS-based Networks","authors":"Sundaram Tirunelveli Radhakrishnan, S. Mohanty","doi":"10.1109/nas51552.2021.9605412","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605412","url":null,"abstract":"Egress Peer Engineering (EPE) is a current technology that is commonly being used to steer egress traffic from an Autonomous System (AS) to external peers via a pre-determined set of links. The steering is generally achieved through a controller that uses Border Gateway Protocol Link-State (BGP-LS) and Segment Routing traffic engineering (SRTE) policies to program the forwarding at the ingress router with a chosen set of MPLS labels that help determine the egress links at the Autonomous System Boundary Routers (ASBRs). An alternate solution is to use BGP Labeled-Unicast (BGP-LU) to distribute the Egress Engineering Labels. We highlight two key limitations in the proposed BGP-LU use-case and provide a solution that mitigates these problem. Our proposed solution is compatible with legacy routers that are currently deployed in production.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130575882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pisacha Srinuan, Purushottam Sigdel, Xu Yuan, Lu Peng, Paul Darby, Christopher Aucoin, N. Tzeng
{"title":"GPU-Assisted Memory Expansion","authors":"Pisacha Srinuan, Purushottam Sigdel, Xu Yuan, Lu Peng, Paul Darby, Christopher Aucoin, N. Tzeng","doi":"10.1109/nas51552.2021.9605372","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605372","url":null,"abstract":"Recent graphic processing units (GPUs) often come with large on-board physical memory to accelerate diverse parallel program executions on big datasets with regular access patterns, including machine learning (ML) and data mining (DM). Such a GPU may underutilize its physical memory during lengthy ML model training or DM, making it possible to lend otherwise unused GPU memory to applications executed concurrently on the host machine. This work explores an effective approach that lets memory-intensive applications run on the host machine CPU with its memory expanded dynamically onto available GPU on-board DRAM, called GPU-assisted memory expansion (GAME). Targeting computer systems equipped with the recent GPUs, our GAME approach permits speedy executions on CPU with large memory footprints by harvesting unused GPU on-board memory on-demand for swapping, far surpassing competitive GPU executions. Implemented in user space, our GAME prototype lets GPU memory house swapped-out memory pages transparently, without code modifications for high usability and portability. The evaluation of NAS-NPB benchmark applications demonstrates that GAME expedites monotasking (or multitasking) executions considerably by up to 2.1× (or 3.1×), when memory footprints exceed the CPU DRAM size and an equipped GPU has unused VDRAM available for swapping use.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121828247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}