{"title":"ML-ACE: Machine Learning Admission Control at the Edge","authors":"Josh Minor","doi":"10.1109/SEC54971.2022.00048","DOIUrl":null,"url":null,"abstract":"ML inference has become an increasingly important workload for low-power, near-data edge computing platforms. There is a large existing body of work on how to optimize a trained model for inference on a resource-constrained device, however much of the work does not consider optimizations in how the model will be used by clients in the system. In this space, inference servers emerged to provide a client-server paradigm for inference, offering portable, practical client libraries for users of ML systems. These servers handle batching of requests, runtime optimizations, and placement of multiple replicas of models on CPU/GPU to maximize inference efficiency. Unlike the data center, much infrastructure at the edge lacks the ease in ability to recruit new machines to scale out these servers to meet increasing request demand. Because of this, efficient scheduling of models on these edge platforms is critical. This work presents ML-ACE, a system to systematically schedule ML inference on resource-constrained edge computing platforms. ML-ACE extends the existing client-server paradigm for inference serving by providing admission control, preventing user inference requests from over-saturating system resources.","PeriodicalId":364062,"journal":{"name":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEC54971.2022.00048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
ML inference has become an increasingly important workload for low-power, near-data edge computing platforms. There is a large existing body of work on how to optimize a trained model for inference on a resource-constrained device, however much of the work does not consider optimizations in how the model will be used by clients in the system. In this space, inference servers emerged to provide a client-server paradigm for inference, offering portable, practical client libraries for users of ML systems. These servers handle batching of requests, runtime optimizations, and placement of multiple replicas of models on CPU/GPU to maximize inference efficiency. Unlike the data center, much infrastructure at the edge lacks the ease in ability to recruit new machines to scale out these servers to meet increasing request demand. Because of this, efficient scheduling of models on these edge platforms is critical. This work presents ML-ACE, a system to systematically schedule ML inference on resource-constrained edge computing platforms. ML-ACE extends the existing client-server paradigm for inference serving by providing admission control, preventing user inference requests from over-saturating system resources.