{"title":"ScaleServe","authors":"Ali Jahanshahi, M. Chow, Daniel Wong","doi":"10.1145/3530390.3532735","DOIUrl":null,"url":null,"abstract":"We present, ScaleServe, a scalable multi-GPU machine learning inference system that (1) is built on an end-to-end open-sourced software stack, (2) is hardware vendor-agnostic, and (3) is designed with modular components to provide users with ease to modify and extend various configuration knobs. ScaleServe also provides detailed performance metrics from different layers of the inference server which allow designers to pinpoint bottlenecks. We demonstrate ScaleServe's serving scalability with several machine learning tasks including computer vision and natural language processing on an 8-GPU server. The performance results for ResNet152 shows that ScaleServe is able to scale well on a multi-GPU platform.","PeriodicalId":442986,"journal":{"name":"Proceedings of the 14th Workshop on General Purpose Processing Using GPU","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"ScaleServe\",\"authors\":\"Ali Jahanshahi, M. Chow, Daniel Wong\",\"doi\":\"10.1145/3530390.3532735\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present, ScaleServe, a scalable multi-GPU machine learning inference system that (1) is built on an end-to-end open-sourced software stack, (2) is hardware vendor-agnostic, and (3) is designed with modular components to provide users with ease to modify and extend various configuration knobs. ScaleServe also provides detailed performance metrics from different layers of the inference server which allow designers to pinpoint bottlenecks. We demonstrate ScaleServe's serving scalability with several machine learning tasks including computer vision and natural language processing on an 8-GPU server. The performance results for ResNet152 shows that ScaleServe is able to scale well on a multi-GPU platform.\",\"PeriodicalId\":442986,\"journal\":{\"name\":\"Proceedings of the 14th Workshop on General Purpose Processing Using GPU\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th Workshop on General Purpose Processing Using GPU\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3530390.3532735\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Workshop on General Purpose Processing Using GPU","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3530390.3532735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We present, ScaleServe, a scalable multi-GPU machine learning inference system that (1) is built on an end-to-end open-sourced software stack, (2) is hardware vendor-agnostic, and (3) is designed with modular components to provide users with ease to modify and extend various configuration knobs. ScaleServe also provides detailed performance metrics from different layers of the inference server which allow designers to pinpoint bottlenecks. We demonstrate ScaleServe's serving scalability with several machine learning tasks including computer vision and natural language processing on an 8-GPU server. The performance results for ResNet152 shows that ScaleServe is able to scale well on a multi-GPU platform.