2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)最新文献_第4页

TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling TrimTuner:通过子采样在云端高效优化机器学习作业

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) Pub Date : 2020-11-09 DOI: 10.1109/MASCOTS50786.2020.9285971

Pedro Mendes, Maria Casimiro, P. Romano, D. Garlan

{"title":"TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling","authors":"Pedro Mendes, Maria Casimiro, P. Romano, D. Garlan","doi":"10.1109/MASCOTS50786.2020.9285971","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285971","url":null,"abstract":"This work introduces TrimTuner, the first system for optimizing machine learning jobs in the cloud to exploit sub-sampling techniques to reduce the cost of the optimization process, while keeping into account user-specified constraints. TrimTuner jointly optimizes the cloud and application-specific parameters and, unlike state of the art works for cloud optimization, eschews the need to train the model with the full training set every time a new configuration is sampled. Indeed, by leveraging sub-sampling techniques and data-sets that are up to 60 x smaller than the original one, we show that TrimTuner can reduce the cost of the optimization process by up to 50 x. Further, TrimTuner speeds-up the recommendation process by 65 x with respect to state of the art techniques for hyperparameter optimization that use sub-sampling techniques. The reasons for this improvement are twofold: i) a novel domain specific heuristic that reduces the number of configurations for which the acquisition function has to be evaluated; ii) the adoption of an ensemble of decision trees that enables boosting the speed of the recommendation process by one additional order of magnitude.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134598993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

COCOA: Cold Start Aware Capacity Planning for Function-as-a-Service Platforms 功能即服务平台的冷启动感知容量规划

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) Pub Date : 2020-07-02 DOI: 10.1109/MASCOTS50786.2020.9285966

Alim Ul Gias, G. Casale

{"title":"COCOA: Cold Start Aware Capacity Planning for Function-as-a-Service Platforms","authors":"Alim Ul Gias, G. Casale","doi":"10.1109/MASCOTS50786.2020.9285966","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285966","url":null,"abstract":"Function-as-a-Service (FaaS) has become increasingly popular in the software industry due to the implied cost-savings in event-driven workloads and its synergy with DevOps. To size an on-premise FaaS platform, it is important to estimate the required CPU and memory capacity to serve the expected loads. Given the service-level agreements, it is however challenging to take the cold start issue into account during the sizing process. We have investigated the similarity of this problem with the hit rate improvement problem in Time to Live (TTL) caches and concluded that solutions for TTL cache, although potentially applicable, lead to over-provisioning in FaaS. Thus, we propose a novel approach, COCOA, to solve this issue. COCOA uses a queueing-based approach to assess the effect of cold starts on FaaS response times. It also considers different memory consumption values depending on whether the function is idle or in execution. Using an event-driven FaaS simulator, FaasSim, that we have developed, we show that COCOA can reduce overprovisioning by over 70% under some of the workloads we have considered, while satisfying the service-level agreements.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128713224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Effective Elastic Scaling of Deep Learning Workloads 深度学习工作负载的有效弹性扩展

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) Pub Date : 2020-06-24 DOI: 10.1109/MASCOTS50786.2020.9285954

Vaibhav Saxena, K. R. Jayaram, Saurav Basu, Yogish Sabharwal, Ashish Verma

{"title":"Effective Elastic Scaling of Deep Learning Workloads","authors":"Vaibhav Saxena, K. R. Jayaram, Saurav Basu, Yogish Sabharwal, Ashish Verma","doi":"10.1109/MASCOTS50786.2020.9285954","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285954","url":null,"abstract":"We examine the elastic scaling of Deep Learning (DL) jobs and propose a novel resource allocation strategy for DL training jobs, resulting in improved job run time performance as well as increased cluster utilization. We begin by analyzing DL workloads and exploit the fact that DL jobs can be run with a range of batch sizes without affecting their final accuracy. We formulate an optimization problem that explores a dynamic batch size allocation to individual DL jobs based on their scaling efficiency, when running on multiple nodes. We design a fast dynamic programming based optimizer to solve this problem in real-time to determine jobs that can be scaled up/down, and use this optimizer in an autoscaler to dynamically change the allocated resources and batch sizes of individual DL jobs. We demonstrate empirically that our elastic scaling algorithm can complete up to as many jobs as compared to a strong baseline algorithm that also scales the number of GPUs but does not change the batch size, with average completion times up to faster.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122184131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Smart Background Scheduler for Storage Systems 存储系统的智能后台调度程序

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) Pub Date : 2020-06-02 DOI: 10.1109/MASCOTS50786.2020.9285967

Maher Kachmar, D. Kaeli

{"title":"A Smart Background Scheduler for Storage Systems","authors":"Maher Kachmar, D. Kaeli","doi":"10.1109/MASCOTS50786.2020.9285967","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285967","url":null,"abstract":"In today's enterprise storage systems, supported data services such as snapshot delete or drive rebuild can result in tremendous performance overhead if executed inline along with heavy foreground IO, often leading to missing Service Level Objectives (SLOs). Typical storage system applications such as Virtual Desktop Infrastructure (VDI) or web services follow a repetitive high/low workload pattern that can be learned and forecasted. We propose a priority-based background scheduler that learns this pattern and allows storage systems to maintain peak performance and meet service level objectives (SLOs) while supporting a number of data services. When foreground IO demand intensifies, system resources are dedicated to service foreground IO requests and any background processing that can be deferred are recorded to be processed in future idle cycles as long as our forecaster predicts that the storage pool has remaining capacity. The smart background scheduler adopts a resource partitioning model that allows both foreground and background IO to execute together as long as foreground IOs are not impacted, harnessing any free cycles to clear background debt. Using traces from VDI and web services applications, we show how our technique can out-perform a static policy that sets fixed limits on the deferred background debt and reduces SLO violations from 54.6% (when using a fixed background debt watermark), to only 6.2 % when dynamically adjusted by our smart background scheduler.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128368277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Age of Information in an Overtake- Free Network of Quasi - Reversible Queues 准可逆队列超车网络中的信息时代

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) Pub Date : 2020-05-28 DOI: 10.1109/MASCOTS50786.2020.9285958

I. Koukoutsidis

引用次数: 2