{"title":"All-you-can-inference: serverless DNN model inference suite","authors":"Subin Park, J. Choi, Kyungyong Lee","doi":"10.1145/3565382.3565878","DOIUrl":null,"url":null,"abstract":"Serverless computing becomes prevalent and is widely adopted for various applications. Deep learning inference tasks are appropriate to be deployed using a serverless architecture due to the nature of fluctuating task arrival events. When serving a Deep Neural Net (DNN) model in a serverless computing environment, there exist many performance optimization opportunities, including various hardware support, model graph optimization, hardware-agnostic model compilation, memory size and batch size configurations, and many others. Although the serverless computing frees users from cloud resource management overhead, it is still very challenging to find an optimal serverless DNN inference environment among a very large optimization opportunities for the configurations. In this work, we propose All-You-Can-Inference (AYCI), which helps users to find an optimally operating DNN inference in a publicly available serverless computing environment. We have built the proposed system as a service using various fully-managed cloud services and open-sourced the system to help DNN application developers to build an optimal serving environment. The prototype implementation and initial experiment result present the difficulty of finding an optimal DNN inference environment with the varying performance.","PeriodicalId":132528,"journal":{"name":"Proceedings of the Eighth International Workshop on Serverless Computing","volume":"46 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth International Workshop on Serverless Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3565382.3565878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Serverless computing becomes prevalent and is widely adopted for various applications. Deep learning inference tasks are appropriate to be deployed using a serverless architecture due to the nature of fluctuating task arrival events. When serving a Deep Neural Net (DNN) model in a serverless computing environment, there exist many performance optimization opportunities, including various hardware support, model graph optimization, hardware-agnostic model compilation, memory size and batch size configurations, and many others. Although the serverless computing frees users from cloud resource management overhead, it is still very challenging to find an optimal serverless DNN inference environment among a very large optimization opportunities for the configurations. In this work, we propose All-You-Can-Inference (AYCI), which helps users to find an optimally operating DNN inference in a publicly available serverless computing environment. We have built the proposed system as a service using various fully-managed cloud services and open-sourced the system to help DNN application developers to build an optimal serving environment. The prototype implementation and initial experiment result present the difficulty of finding an optimal DNN inference environment with the varying performance.