{"title":"μSteal","authors":"Amirhossein Mirhosseini, T. Wenisch","doi":"10.1145/3447818.3463529","DOIUrl":null,"url":null,"abstract":"Modern internet services are moving towards distributed microservice architectures, wherein a complex application is decomposed into numerous discrete microservices to improve programmability, reliability, manageability, and scalability. A key property of microservice-based architectures is that common microservices may be shared by multiple end-to-end cloud services. As an example, a speech-recognition microservice might serve as an early node in the microservice graphs of several end-to-end services. However, given the dissimilarities across microservice graphs and varying end-to-end latency constraints across services, shared microservices may need to operate under differing latency constraints for each service. As a result, in existing systems, most providers either deploy multiple instance pools for each latency constraint, or require all requests to needlessly meet the most stringent constraint. In this paper, we argue that sharing microservice instances across multiple services can reduce significantly the number of instances, especially under highly asymmetric latency constraints. We propose a request scheduling mechanism, called μSteal, which leverages preemptive work and resource stealing to schedule the arriving requests to cores within a ``mixed-criticality'' microservice instance. μSteal provisions ``core reservations'' for each request class based on their latency requirements, but allows a class to steal cores from other classes if they would otherwise remain idle. But, when a class requires its full reservation, μSteal preempts stolen cores, returning them to their reserved class. μSteal employs a runtime feedback controller augmented by a queuing theory-based analytical model to tune core reservations across classes, seeking to maximize the request throughput within each instance while meeting all classes' latency constraints. We show that μSteal reduces required instances for several shared microservice deployments by 1.29x as compared to deploying multiple, segregated instance pools.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":"136 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447818.3463529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Modern internet services are moving towards distributed microservice architectures, wherein a complex application is decomposed into numerous discrete microservices to improve programmability, reliability, manageability, and scalability. A key property of microservice-based architectures is that common microservices may be shared by multiple end-to-end cloud services. As an example, a speech-recognition microservice might serve as an early node in the microservice graphs of several end-to-end services. However, given the dissimilarities across microservice graphs and varying end-to-end latency constraints across services, shared microservices may need to operate under differing latency constraints for each service. As a result, in existing systems, most providers either deploy multiple instance pools for each latency constraint, or require all requests to needlessly meet the most stringent constraint. In this paper, we argue that sharing microservice instances across multiple services can reduce significantly the number of instances, especially under highly asymmetric latency constraints. We propose a request scheduling mechanism, called μSteal, which leverages preemptive work and resource stealing to schedule the arriving requests to cores within a ``mixed-criticality'' microservice instance. μSteal provisions ``core reservations'' for each request class based on their latency requirements, but allows a class to steal cores from other classes if they would otherwise remain idle. But, when a class requires its full reservation, μSteal preempts stolen cores, returning them to their reserved class. μSteal employs a runtime feedback controller augmented by a queuing theory-based analytical model to tune core reservations across classes, seeking to maximize the request throughput within each instance while meeting all classes' latency constraints. We show that μSteal reduces required instances for several shared microservice deployments by 1.29x as compared to deploying multiple, segregated instance pools.