Dheeraj Chahal, Mayank Mishra, S. Palepu, R. Singh, Rekha Singhal
{"title":"Pay-as-you-Train: Efficient ways of Serverless Training","authors":"Dheeraj Chahal, Mayank Mishra, S. Palepu, R. Singh, Rekha Singhal","doi":"10.1109/IC2E55432.2022.00020","DOIUrl":null,"url":null,"abstract":"Serverless (FaaS) architecture is emerging as a paradigm of choice for many application types, including event triggered, query processing, and machine learning (ML). The use of serverless platforms for ML inference is well known, but its applicability for model training is still under exploration. This paper presents an efficient “pay-as-you-train” methodology for training large deep learning models using serverless cloud services for compute and data management. Serverless compute (such as AWS Lambda) and serverless data management systems (such as AWS key-value store DynamoDB) impose restrictions on the computing time and size of the allowed data objects respectively. We present a novel approach for training deep learning models, which overcomes the limitations imposed by the underlying serverless platforms. We also present an analytical model to study the performance and cost involved in training using different data management services (such as AWS object storage S3, in-memory Memcached, and DynamoDB) as a communication channel with serverless platforms. Additionally, we compare the performance and cost of these services available on cloud. Our optimization techniques improve the performance and hence the cost of training by a factor of 1.2x to 5.5x with these services.","PeriodicalId":415781,"journal":{"name":"2022 IEEE International Conference on Cloud Engineering (IC2E)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cloud Engineering (IC2E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E55432.2022.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Serverless (FaaS) architecture is emerging as a paradigm of choice for many application types, including event triggered, query processing, and machine learning (ML). The use of serverless platforms for ML inference is well known, but its applicability for model training is still under exploration. This paper presents an efficient “pay-as-you-train” methodology for training large deep learning models using serverless cloud services for compute and data management. Serverless compute (such as AWS Lambda) and serverless data management systems (such as AWS key-value store DynamoDB) impose restrictions on the computing time and size of the allowed data objects respectively. We present a novel approach for training deep learning models, which overcomes the limitations imposed by the underlying serverless platforms. We also present an analytical model to study the performance and cost involved in training using different data management services (such as AWS object storage S3, in-memory Memcached, and DynamoDB) as a communication channel with serverless platforms. Additionally, we compare the performance and cost of these services available on cloud. Our optimization techniques improve the performance and hence the cost of training by a factor of 1.2x to 5.5x with these services.