{"title":"Improving Relational Database Upon the Arrival of Storage Hardware with Built-in Transparent Compression","authors":"Yifan Qiao, Xubin Chen, Jingpeng Hao, Jiangpeng Li, Qi Wu, Jingqiang Wang, Yang Liu, Tong Zhang","doi":"10.1109/nas51552.2021.9605481","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605481","url":null,"abstract":"This paper presents an approach to enable relational database take full advantage of modern storage hardware with built-in transparent compression. Advanced storage appliances (e.g., all-flash array) and some latest SSDs (solid-state drives) can perform hardware-based data compression, transparently from OS and applications. Moreover, the growing deployment of hardware-based compression capability in Cloud storage infrastructure leads to the imminent arrival of cloud-based storage hardware with built-in transparent compression. To make relational database better leverage modern storage hardware, we propose to deploy a dual in-memory vs. on-storage page format: While pages in database cache memory retain the conventional row-based format, each page on storage devices has a column-based format so that it can be better compressed by storage hardware. We present design techniques that can further improve the on-storage page data compressibility through additional light-weight column data transformation. We the impact of compression algorithms on the selection of column data transformation techniques. We integrated the design techniques into MySQL/InnoDB by adding only about 600 lines of code, and ran Sysbench OLTP workloads on a commercial SSD with built-in transparent compression. The results show that the proposed solution can bring up to 45% additional reduction on the storage cost at only a few percentage of performance degradation.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121945097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Terrestrial and Space-based Cloud Computing with Scalable, Responsible and Explainable Artificial Intelligence - A Position Paper","authors":"D. Martizzi, P. Ray","doi":"10.1109/nas51552.2021.9605446","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605446","url":null,"abstract":"Adoption of cloud computing and storage is becoming ubiquitous in multiple industries and human endeavors. The cloud computing market is expected to significantly evolve during the next decade. In particular, in order to enhance security and remote accessibility, several new architectures have been proposed to move a significant part of the cloud stack to satellites in space. These technologies are expected to become more prominent in the coming years. Despite the significant improvements hybrid terrestrial and space-based cloud architectures would bring, the growth in size of both infrastructures and distributed compute and storage tasks poses a significant challenges for organizations interested in deploying their software stack to cloud. In this Position Paper, we provide a series of basic principles to develop a scalable, responsible and explainable Artificial Intelligence platform that will assist experts in enhancing the efficiency of cloud deployments.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130278478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. W. Ford, Apan Qasem, Jelena Tešić, Ziliang Zong
{"title":"Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach","authors":"B. W. Ford, Apan Qasem, Jelena Tešić, Ziliang Zong","doi":"10.1109/nas51552.2021.9605443","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605443","url":null,"abstract":"For decades, the x86 architecture supported by Intel and AMD has been the dominate target for software development. Recently, ARM has solidified itself as a highly competitive and promising CPU architecture by exhibiting both high performance and low power consumption simultaneously. In the foreseeable future, a copious amount of software will be fully migrated to the ARM architecture or support both x86 and ARM simultaneously. Nevertheless, software ports from x86 to ARM are not trivial for a number of reasons. First, it is time consuming to write code that resolves all compatibility issues for a new architecture. Second, specific hardware (e.g. ARM chips) and supporting toolkits (e.g. libraries and compilers) may not be readily available for developers, which will delay the porting process. Third, it is hard to predict the performance of software before testing it on production chips. In this paper, we strive to tackle these challenges by proposing an instruction prediction method that can automatically generate AARCH64 code from existing x86-64 executables. Although the generated code might not be directly executable, it provides a cheap and efficient solution for developers to estimate certain runtime metrics before actually building, deploying and testing code on an ARM-based CPU. Our experimental results show that AARCH64 instructions derived using prediction can achieve a high Bilingual Evaluation Understudy (BLEU) Score. This indicates a quality match between generated executables and natively ported AARCH64 software.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129873139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Variable-Grained Resemblance Data Deduplication For Cloud Storage","authors":"Xuming Ye, Jia Tang, Wenlong Tian, Ruixuan Li, Weijun Xiao, Yuqing Geng, Zhiyong Xu","doi":"10.1109/nas51552.2021.9605398","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605398","url":null,"abstract":"With the prevalence of cloud storage, data deduplication has been a widely used technology by removing cross users’ duplicate data and saving network bandwidth. Nevertheless, traditional data deduplication hardly detects duplicate data among resemblance chunks. Currently, a resemblance data deduplication, called Finesse, has been proposed to detect and remove the duplicate data among similar chunks efficiently. However, we observe that the chunks following the similar chunk have a high chance of resembling data locality property, and vice versa. Processing these adjacent similar chunks in small average chunk size level increases the metadata, which deteriorates the deduplication system performance. Moreover, existing resemblance data deduplication schemes ignore the performance impact from metadata. Therefore, we propose a fast variable-grained resemblance data deduplication for cloud storage. It dynamically combines the adjacent resemblance chunks or unique chunks or breaks those chunks, located at the transition region between resemblance chunks and unique chunks. Finally, we implement a prototype and conduct a serial of experiments on real-world datasets. The results show that our method dramatically reduces the metadata size while achieving the high deduplication ratio.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhijun Wang, Yunxiang Wu, Stoddard Rosenkrantz, Ning Li, Minh Nguyen, Hao Che
{"title":"An Incast-Coflow-Aware Minimum-Rate-Guaranteed Congestion Control Protocol for Datacenter Applications","authors":"Zhijun Wang, Yunxiang Wu, Stoddard Rosenkrantz, Ning Li, Minh Nguyen, Hao Che","doi":"10.1109/nas51552.2021.9605478","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605478","url":null,"abstract":"Today s datacenters need to meet service level objectives (SLOs) for applications, which can be translated into deadlines for (co)flows running between job execution stages. As a result, meeting (co)flow deadlines with high probabilities is essential to attract and retain customers and hence, generate high revenue. To fill the lack of a transport protocol that can facilitate low (co)flow deadline miss rate, especially in the face of incast congestion, in this paper, we propose DCMRG, an incast-coflow-aware, ECN-based soft minimum-rate-guaranteed congestion control protocol for datacenter applications. DCMRG is composed of two major components, i.e., a congestion controller running on the send host and an incast congestion controller running on the receive host. DCMRG possesses three salient features. First, it is the first congestion control protocol that integrates congestion control with coflow-aware incast control while providing soft minimum flow rate guarantee. Second, DCMRG is readily deployable in datacenter networks. It only requires software upgrade in the hosts and minimum assistance (i.e., ECN) from in-network nodes. Third, DCMRG is backward compatible with and, by design, friendly to the widely deployed, standard-based transport protocols, such as DCTCP. The results from large-scale datacenter network simulation demonstrate that in the absence of incast congestion, DCMRG can reduce flow deadline miss rates by 3x and 1.6x compared to D2TCP and MRG, respectively. Moreover, DCMRG further reduces the coflow deadline miss rate by more than 40% and 60% and lowers the packet drop probability by 60% and 80%, in the face of incast congestion, compared to D2TCP with ICTCP and MRG with ICTCP, respectively.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das
{"title":"Cache Compression with Efficient in-SRAM Data Comparison","authors":"Xiaowei Wang, C. Augustine, E. Nurvitadhi, R. Iyer, Li Zhao, R. Das","doi":"10.1109/nas51552.2021.9605440","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605440","url":null,"abstract":"We present a novel cache compression method that leverages the fine-grained data duplication across cache lines. We leverage the XOR operation of the in-SRAM bit-line computing peripherals, to search for compressible data over a wide range of data locations on cache, reducing the data movement requirements. To reduce the decompression latency, we design specialized compression schemes by fetching the data with the same parallelism as the original cache, according to the architecture of the last-level cache slice. The proposed compression method achieves a 2.05× compression ratio on average (up to 67×), and 4.73% of speedup on average (up to 29%), over the SPEC2006 benchmarks.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121592232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Case Study of Migrating RocksDB on Intel Optane Persistent Memory","authors":"Ziyi Lu, Q. Cao","doi":"10.1109/nas51552.2021.9605438","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605438","url":null,"abstract":"The application of product-level persistent memory (PM) presents a great opportunity for key-value stores. However, PM devices differ significantly from traditional block-based storage devices such as HDD and SSD in terms of IO characteristics and approaches. To reveal the adaptability of existing persistent key-value store on PM and to explore the potential optimization space of PM-based key-value stores, we migrate one of the most widely used persistent key-value store, RocksDB, to PM device and evaluated its performance. The results show that the performance of RocksDB is limited by the traditional IO stacks optimized for fast SSDs on PM devices. We then perform further experimental analysis on the IO methods of the two main files, log and SST, in RocksDB. Based on the results, we propose a set of optimized IO configurations for each of the two files. These configurations improve read and write performance of RocksDB by up to 3× and 2×, respectively, over the default configurations on an Intel Optane Persistent Memory.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reza Salkhordeh, Kevin Kremer, Lars Nagel, Dennis Maisenbacher, Hans Holmberg, Matias Bjørling, A. Brinkmann
{"title":"Constant Time Garbage Collection in SSDs","authors":"Reza Salkhordeh, Kevin Kremer, Lars Nagel, Dennis Maisenbacher, Hans Holmberg, Matias Bjørling, A. Brinkmann","doi":"10.1109/nas51552.2021.9605386","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605386","url":null,"abstract":"The Flash Translation Layer (FTL) plays a crucial role for the performance and lifetime of SSDs. It has been difficult to evaluate different FTL strategies in real SSDs in the past, as the FTL has been deeply embedded into the SSD hardware. Recent host-based FTL architectures like ZNS now enable researchers to implement and evaluate new FTL strategies. In this paper, we evaluate the overhead of various garbage collection strategies using a host-side FTL, and show their performance limitations when scaling the SSD size or the number of outstanding requests. To address these limitations, we propose constant cost-benefit policy, which removes the scalability limitations of previous policies and can be efficiently deployed on host-based architectures. The experimental results show that our proposed policy significantly reduces the CPU overhead while having a comparable write amplification compared to the best previous policies.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115465433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ian Wang, Shixiong Qi, Elizabeth Liri, K. Ramakrishnan
{"title":"Towards a Proactive Lightweight Serverless Edge Cloud for Internet-of-Things Applications","authors":"Ian Wang, Shixiong Qi, Elizabeth Liri, K. Ramakrishnan","doi":"10.1109/nas51552.2021.9605384","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605384","url":null,"abstract":"Edge cloud solutions that bring the cloud closer to the sensors can be very useful to meet the low latency requirements of many Internet-of-Things (IoT) applications. However, IoT traffic can also be intermittent, so running applications constantly can be wasteful. Therefore, having a serverless edge cloud that is responsive and provides low-latency features is a very attractive option for a resource and cost-efficient IoT application environment.In this paper, we discuss the key components needed to support IoT traffic in the serverless edge cloud and identify the critical challenges that make it difficult to directly use existing serverless solutions such as Knative, for IoT applications. These include overhead from heavyweight components for managing the overall system and software adaptors for communication protocol translation used in off-the-shelf serverless platforms that are designed for large-scale centralized clouds. The latency imposed by ‘cold start’ is a further deterrent.To address these challenges we redesign several components of the Knative serverless framework. We use a streamlined protocol adaptor to leverage the MQTT IoT protocol in our serverless framework for IoT event processing. We also create a novel, event-driven proxy based on the extended Berkeley Packet Filter (eBPF), to replace the regular heavyweight Knative queue proxy. Our preliminary experimental results show that the event-driven proxy is a suitable replacement for the queue proxy in an IoT serverless environment and results in lower CPU usage and a higher request throughput.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127998209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Lujan, Michael McCrary, B. W. Ford, Ziliang Zong
{"title":"Vulkan vs OpenGL ES: Performance and Energy Efficiency Comparison on the big.LITTLE Architecture","authors":"Michael Lujan, Michael McCrary, B. W. Ford, Ziliang Zong","doi":"10.1109/nas51552.2021.9605447","DOIUrl":"https://doi.org/10.1109/nas51552.2021.9605447","url":null,"abstract":"Mobile apps such as games and virtual reality(VR) are getting increasingly popular but they drain battery quickly due to the heavy graphics rending process. Currently, Open Graphics Library for Embedded Systems (OpenGL ES) is the dominating API for rendering advanced graphics on embedded and mobile systems. Despite the attracting usability of OpenGL ES, the lacking support of multi-threading limits its performance and power efficiency on modern multicore mobile chips, especially when the big.LITTLE architecture has become the de facto industry standard of mobile phones. Vulkan was recently proposed to address the weaknesses of OpenGL but its performance and energy efficiency on the big.LITTLE architecture has not been fully explored yet. This paper conducts a comprehensive study to compare the performance and energy efficiency of Vulkan versus OpenGL ES on an ARM processor with both high performance cores (i.e. big cores) and low power cores (i.e. LITTLE cores). Our experimental results show that 1) Vulkan can save up to 24% of energy by leveraging multi-threading and parallel execution on LITTLE cores for heavy workloads; and 2) Vulkan can render at a much higher frame rate when OpenGL ES has reached its full capability. Meanwhile, writing efficient Vulkan code is not trivial and the performance/energy gains are negligible for light workloads. The clear tradeoff between optimizing verbose Vulkan code manually and potential performance or energy efficiency benefits should be carefully considered.","PeriodicalId":135930,"journal":{"name":"2021 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131585735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}