用于实时计算机视觉应用的多租户移动卸载系统

Proceedings of the 20th International Conference on Distributed Computing and Networking Pub Date : 2019-01-04 DOI:10.1145/3288599.3288634

Zhou Fang, Jeng-Hau Lin, M. Srivastava, Rajesh K. Gupta

{"title":"用于实时计算机视觉应用的多租户移动卸载系统","authors":"Zhou Fang, Jeng-Hau Lin, M. Srivastava, Rajesh K. Gupta","doi":"10.1145/3288599.3288634","DOIUrl":null,"url":null,"abstract":"Offloading techniques enable many emerging computer vision applications on mobile platforms by executing compute-intensive tasks on resource-rich servers. Although there have been a significant amount of research efforts devoted in optimizing mobile offloading frameworks, most previous works are evaluated in a single-tenant setting, that is, a server is assigned to a single client. However, in a practical scenario that servers must handle tasks from many clients running diverse applications, contention on shared server resources may degrade application performance. In this work, we study scheduling techniques to improve serving performance in multi-tenant mobile offloading systems, for computer vision algorithms running on CPUs and deep neural networks (DNNs) running on GPUs. For CPU workloads, we present methods to mitigate resource contention and to improve delay using a Plan-Schedule approach. The planning phase predicts future workloads from all clients, estimates contention, and adjusts future task start times to remove or reduce contention. The scheduling phase dispatches arriving offloaded tasks to the server that minimizes contention. For DNN workloads running on GPUs, we propose adaptive batching algorithms using information of batch size, model complexity and system load to achieve the best Quality of Service (QoS), which are measured from accuracy and delay of DNN tasks. We demonstrate the improvement of serving performance using several real-world applications with different server deployments.","PeriodicalId":346177,"journal":{"name":"Proceedings of the 20th International Conference on Distributed Computing and Networking","volume":"17 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Multi-tenant mobile offloading systems for real-time computer vision applications\",\"authors\":\"Zhou Fang, Jeng-Hau Lin, M. Srivastava, Rajesh K. Gupta\",\"doi\":\"10.1145/3288599.3288634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Offloading techniques enable many emerging computer vision applications on mobile platforms by executing compute-intensive tasks on resource-rich servers. Although there have been a significant amount of research efforts devoted in optimizing mobile offloading frameworks, most previous works are evaluated in a single-tenant setting, that is, a server is assigned to a single client. However, in a practical scenario that servers must handle tasks from many clients running diverse applications, contention on shared server resources may degrade application performance. In this work, we study scheduling techniques to improve serving performance in multi-tenant mobile offloading systems, for computer vision algorithms running on CPUs and deep neural networks (DNNs) running on GPUs. For CPU workloads, we present methods to mitigate resource contention and to improve delay using a Plan-Schedule approach. The planning phase predicts future workloads from all clients, estimates contention, and adjusts future task start times to remove or reduce contention. The scheduling phase dispatches arriving offloaded tasks to the server that minimizes contention. For DNN workloads running on GPUs, we propose adaptive batching algorithms using information of batch size, model complexity and system load to achieve the best Quality of Service (QoS), which are measured from accuracy and delay of DNN tasks. We demonstrate the improvement of serving performance using several real-world applications with different server deployments.\",\"PeriodicalId\":346177,\"journal\":{\"name\":\"Proceedings of the 20th International Conference on Distributed Computing and Networking\",\"volume\":\"17 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th International Conference on Distributed Computing and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3288599.3288634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th International Conference on Distributed Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3288599.3288634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

卸载技术通过在资源丰富的服务器上执行计算密集型任务，使许多新兴的计算机视觉应用程序能够在移动平台上运行。尽管在优化移动卸载框架方面已经进行了大量的研究工作，但大多数先前的工作都是在单租户设置中进行评估的，即将服务器分配给单个客户端。但是，在服务器必须处理来自运行不同应用程序的许多客户机的任务的实际场景中，对共享服务器资源的争用可能会降低应用程序的性能。在这项工作中，我们研究了调度技术，以提高多租户移动卸载系统中运行在cpu上的计算机视觉算法和运行在gpu上的深度神经网络(dnn)的服务性能。对于CPU工作负载，我们提出了使用计划-调度方法来减轻资源争用和改善延迟的方法。规划阶段预测来自所有客户机的未来工作负载，估计争用，并调整未来的任务启动时间以消除或减少争用。调度阶段将到达的卸载任务分派给最大限度减少争用的服务器。对于在gpu上运行的DNN工作负载，我们提出了自适应批处理算法，利用批处理大小、模型复杂性和系统负载的信息来实现最佳的服务质量(QoS)，这是从DNN任务的准确性和延迟来衡量的。我们使用几个具有不同服务器部署的实际应用程序来演示服务性能的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-tenant mobile offloading systems for real-time computer vision applications

Offloading techniques enable many emerging computer vision applications on mobile platforms by executing compute-intensive tasks on resource-rich servers. Although there have been a significant amount of research efforts devoted in optimizing mobile offloading frameworks, most previous works are evaluated in a single-tenant setting, that is, a server is assigned to a single client. However, in a practical scenario that servers must handle tasks from many clients running diverse applications, contention on shared server resources may degrade application performance. In this work, we study scheduling techniques to improve serving performance in multi-tenant mobile offloading systems, for computer vision algorithms running on CPUs and deep neural networks (DNNs) running on GPUs. For CPU workloads, we present methods to mitigate resource contention and to improve delay using a Plan-Schedule approach. The planning phase predicts future workloads from all clients, estimates contention, and adjusts future task start times to remove or reduce contention. The scheduling phase dispatches arriving offloaded tasks to the server that minimizes contention. For DNN workloads running on GPUs, we propose adaptive batching algorithms using information of batch size, model complexity and system load to achieve the best Quality of Service (QoS), which are measured from accuracy and delay of DNN tasks. We demonstrate the improvement of serving performance using several real-world applications with different server deployments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th International Conference on Distributed Computing and Networking

自引率

0.00%

发文量