Aakash Khochare, Tuhin Khare, Varad Kulkarni, Yogesh L. Simmhan
{"title":"XFaaS: Cross-platform Orchestration of FaaS Workflows on Hybrid Clouds","authors":"Aakash Khochare, Tuhin Khare, Varad Kulkarni, Yogesh L. Simmhan","doi":"10.1109/CCGrid57682.2023.00053","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00053","url":null,"abstract":"Functions as a Service (FaaS) have gained popularity for programming public clouds due to their simple abstraction, ease of deployment, effortless scaling and granular billing. Cloud providers also offer basic capabilities to compose these functions into workflows. FaaS and FaaS workflow models, however, are proprietary to each cloud provider. This prevents their portability across cloud providers, and requires effort to design workflows that run on different cloud providers or data centers. Such requirements are increasingly important to meet regulatory requirements, leverage cost arbitrage and avoid vendor lock-in. Further, the FaaS execution models are also different, and the overheads of FaaS workflows due to message indirection and cold-starts need custom optimizations for different platforms. In this paper, we propose XFaaS, a cross-platform deployment and orchestration engine for FaaS workflows to operate on multiple clouds. XFaaS allows “zero touch” deployment of functions and workflows across AWS and Azure clouds by automatically generating the necessary code wrappers, cloud queues, and coordinating with the native FaaS engine of the cloud providers. It also uses intelligent function fusion and placement logic to reduce the workflow execution latency in a hybrid cloud while mitigating costs, using performance and billing models specific to the providers based in detailed benchmarks. Our empirical results indicate that fusion offers up to ≈75 % benefits in latency and ≈57% reduction in cost, while placement strategies reduce the latency by ≈ 24%, compared to baselines in the best cases.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123242431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paahuni Khandelwal, M. Warushavithana, S. Pallickara, S. Pallickara
{"title":"Enabling Fast, Effective Visualization of Voluminous Gridded Spatial Datasets","authors":"Paahuni Khandelwal, M. Warushavithana, S. Pallickara, S. Pallickara","doi":"10.1109/CCGrid57682.2023.00061","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00061","url":null,"abstract":"Gridded spatial datasets arise naturally in environmental, climatic, meteorological, and ecological settings. Each grid point encapsulates a vector of variables representing different measures of interest. Gridded datasets tend to be voluminous since they encapsulate observations for long timescales. Visualizing such datasets poses significant challenges stemming from the need to preserve interactivity, manage I/O overheads, and cope with data volumes. Here we present our methodology to significantly alleviate I/O requirements by leveraging deep neural network-based models and a distributed, in-memory cache to facilitate interactive visualizations. Our benchmarks demonstrate that deploying our lightweight models coupled with back-end caching and prefetching schemes can reduce the client's query response time by 92.3% while maintaining a high perceptual quality with a PSNR (peak signal-to-noise ratio) of 38.7 dB.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117247414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring the Impact of Gradient Accumulation on Cloud-based Distributed Training","authors":"Zimeng Huang, Bo Jiang, Tian Guo, Yunzhuo Liu","doi":"10.1109/CCGrid57682.2023.00040","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00040","url":null,"abstract":"Gradient accumulation (GA) is a commonly adopted technique for addressing the GPU memory shortage problem in model training. It reduces memory consumption at the cost of increased computation time. Although widely used, its benefits to model training have not been systematically studied. Our work evaluates and summarizes the benefits of GA, especially in cloud-based distributed training scenarios, where training cost is determined by both execution time and resource consumption. We focus on how GA can be utilized to balance execution time and resource consumption to achieve the lowest bills. Through empirical evaluations on AliCloud platforms, we observe that the total training cost can be reduced by 31.2% on average with a 17.3% increase in training time, when GA is introduced in the large-model and small-bandwidth scenarios with data-parallel training strategies. Besides, taking micro-batch size into optimization can further decrease training time and cost by 21.2% and 24.8% on average, respectively, for hybrid-parallel strategies in large-model and GPU training scenarios.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129747439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beth Plale, Preeti Malakar, Meenakshi D'Souza, H. Kapoor, Yogesh L. Simmhan, I. Altintas, Manohar Swaminathan
{"title":"CCGRID 2023: A Holistic Approach to Inclusion and Belonging","authors":"Beth Plale, Preeti Malakar, Meenakshi D'Souza, H. Kapoor, Yogesh L. Simmhan, I. Altintas, Manohar Swaminathan","doi":"10.1109/CCGrid57682.2023.00070","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00070","url":null,"abstract":"“CCGRID will act with responsibility as its primary consideration; with equity, diversity, and inclusion as its central goals.” from the CCGRID 2023 web site [1]","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127345051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Zhang, Yisong Chang, Tianyue Lu, Ke Zhang, Mingyu Chen
{"title":"Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric","authors":"Xu Zhang, Yisong Chang, Tianyue Lu, Ke Zhang, Mingyu Chen","doi":"10.1109/CCGrid57682.2023.00013","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00013","url":null,"abstract":"With the evolution of network fabrics, message-passing clusters have been promising solutions for large-scale graph processing. Alternatively, the shared-memory model is also introduced to avoid redundant copies and extra storage space of graph data. Compared to conventional network fabrics, with the capability of fine-grained, byte-addressable remote memory access, emerging memory semantic interconnects and fabrics, e.g., Intel's Compute Express Link (CXL), are intuitively more appropriate for adoption in shared-memory clusters. However, due to the latency gap between local and remote memory, it is still challenging to take advantage of the shared-memory graph processing with memory semantic fabrics. To tackle this problem, in this paper, we first investigate memory access characterizations of graph vertex propagation based on the shared-memory model. Then we propose GraCXL, a series of design paradigms to address high-frequency and long-latency of remote memory access potentially incurred in CXL-based clusters. For system adaptiveness, we elaborate GraCXL towards the general-purpose CPU cluster and the domain-specific FPGA accelerator array, respectively. We design a custom fabric with the CXL.mem protocol and leverage a couple of ARM SoC-equipped FPGAs to build an evaluation prototype in the absence of commodity CXL hardware and platforms. Experimental results show that the proposed GraCXL CPU and FPGA clusters achieve 1.33x-8.92x and 2.48x-5.01x performance improvement, respectively.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132814190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiliang Zheng, Zhenxiang Chen, Yang Li, Xiaoqing Jiang, Xueyang Cao
{"title":"Speaker recognition system of flexible throat microphone using contrastive learning","authors":"Weiliang Zheng, Zhenxiang Chen, Yang Li, Xiaoqing Jiang, Xueyang Cao","doi":"10.1109/CCGrid57682.2023.00065","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00065","url":null,"abstract":"Recently, Flexible pressure sensor-based Throat Microphones (FTM) have attracted more attention in noise-robust speaker recognition and are promising for helping people with specific dysarthria to complete speaker recognition. FTM has outstanding flexibility compared with Hard Throat Microphones (HTM) and noise-robustness compared with Close-talk microphones (CM). However, speaker recognition for FTM is still an open task awaiting exploration since FTM has degradation problems and a lack of data sets. To tackle these two obstacles, referring to feature mapping methods for HTM, we introduce an FTM-oriented supervised contrastive learning (FTMSCL) method. An FTM speech data set is collected, then a contrastive loss function is designed to avoid the feature mapping methods' problems and effectively leverage label information from this data set. Furthermore, a critical parameter margin in this loss and several data augmentations for FTM are investigated. Experimental results show that, with no need for CM data, FTMSCL can achieve a False Acceptance Rate (FAR) of 2.97% and a False Rejection Rate (FRR) of 2.83%, which outperforms a conventional End-to-End one and an advanced feature mapping one significantly. Moreover, the best FAR and FRR of our FTMSCL method are only 0.86% and 0.83% higher than the best one using clean CM data.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122899285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Vasconcelos, Daniel Cordeiro, Georges Da Costa, F. Dufossé, J. Nicod, V. Rehn-Sonigo
{"title":"Optimal sizing of a globally distributed low carbon cloud federation","authors":"M. Vasconcelos, Daniel Cordeiro, Georges Da Costa, F. Dufossé, J. Nicod, V. Rehn-Sonigo","doi":"10.1109/CCGrid57682.2023.00028","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00028","url":null,"abstract":"The carbon footprint of IT technologies has been a significant concern in recent years. This concern mainly focuses on the electricity consumption of data centers; many cloud suppliers commit to using 100% of renewable energy sources. However, this approach neglects the impact of device manufacturing. We consider in this paper the question of dimensioning the renewable energy sources of a geographically distributed cloud with considering the carbon impact of both the grid electricity consumption in the considered locations and the manufacturing of solar panels and batteries. We design a linear program to optimize cloud dimensioning over one year, considering worldwide locations for data centers, real-life workload traces, and solar irradiation values. Our results show a carbon footprint reduction of about 30% compared to a cloud fully supplied by solar energy and of 85% compared to the 100% grid electricity model.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129704282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bottleneck identification and failure prevention with procedural learning in 5G RAN","authors":"Tobias Sundqvist, M. Bhuyan, E. Elmroth","doi":"10.1109/CCGrid57682.2023.00047","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00047","url":null,"abstract":"To meet the low latency requirements of 5G Radio Access Networks (RAN), it is essential to learn where performance bottlenecks occur. As parts are distributed and virtualized, it becomes troublesome to identify where unwanted delays occur. Today, vendors spend huge manual effort analyzing key performance indicators (KPIs) and system logs to detect these bottlenecks. The 5G architecture allows a flexible scaling of microservices to handle the variation in traffic. But knowing how, when, and where to scale is difficult without a detailed latency analysis. In this article, we propose a novel method that combines procedural learning with latency analysis of system log events. The method, which we call LogGenie, learns the latency pattern of the system at different load scenarios and automatically identifies the parts with the most significant increase in latency. Our evaluation in an advanced 5G testbed shows that LogGenie can provide a more detailed analysis than previous research has achieved and help troubleshooters locate bottlenecks faster. Finally, through experiments, we show how a latency prediction model can dynamically fine-tune the behavior where bottlenecks occur. This lowers resource utilization, makes the architecture more flexible, and allows the system to fulfill its latency requirements.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126865383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeonho Yoo, Gyeongsik Yang, Changyong Shin, J. Lee, C. Yoo
{"title":"Control Channel Isolation in SDN Virtualization: A Machine Learning Approach","authors":"Yeonho Yoo, Gyeongsik Yang, Changyong Shin, J. Lee, C. Yoo","doi":"10.1109/CCGrid57682.2023.00034","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00034","url":null,"abstract":"Performance isolation is an essential property that network virtualization must provide for clouds. This study addresses the performance isolation of the control plane in virtualized software-defined networking (SDN), which we call control channel isolation. First, we report that the control channel isolation is seriously broken in the existing network hypervisor in that the end-to-end control latency grows by up to 15 x as the number of virtual switches increases. This jeopardizes the key network operations, such as routing, in datacenters. To address this issue, we take a machine learning approach that learns from the past control traffic as time-series data. We propose a new network hypervisor, Meteor, that designs an LSTM autoencoder to predict the control traffic per virtual switch. Our evaluation results show that Meteor improves the processing latency per control message by up to 12.7x. Furthermore, Meteor reduces the end-to-end control latency by up to 73.7%, which makes it comparable to the non-virtualized SDN.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129649411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claire Songhyun Lee, V. Hewes, G. Cerati, J. Kowalkowski, Adam Aurisano, Ankit Agrawal, Alok Ratan Choudhary, W. Liao
{"title":"A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows","authors":"Claire Songhyun Lee, V. Hewes, G. Cerati, J. Kowalkowski, Adam Aurisano, Ankit Agrawal, Alok Ratan Choudhary, W. Liao","doi":"10.1109/CCGrid57682.2023.00017","DOIUrl":"https://doi.org/10.1109/CCGrid57682.2023.00017","url":null,"abstract":"Running scientific workflow applications on high-performance computing systems provides promising results in terms of accuracy and scalability. An example is the particle track reconstruction research in high-energy physics that consists of multiple machine-learning tasks. However, as the modern HPC system scales up, researchers spend more effort on coordinating the individual workflow tasks due to their increasing demands on computational power, large memory footprint, and data movement among various storage devices. These issues are further exacerbated when intermediate result data must be shared among different tasks and each is optimized to fulfill its own design goals, such as the shortest time or minimal memory footprint. In this paper, we investigate the data management challenges presented in scientific workflows. We observe that individual tasks, such as data generation, data curation, model training, and inference, often use data layouts only best for one's I/O performance but orthogonal to its successive tasks. We propose various solutions by employing alternative data structures and layouts in consideration of two tasks running consecutively in the workflow. Our experimental results show up to a 16.46x and 3.42x speedup for initialization time and I/O time respectively, compared to previous approaches.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}