MDINFERENCE: Balancing Inference Accuracy and Latency for Mobile Applications

2020 IEEE International Conference on Cloud Engineering (IC2E) Pub Date : 2020-02-16 DOI:10.1109/IC2E48712.2020.00010

Samuel S. Ogden, Tian Guo

{"title":"MDINFERENCE: Balancing Inference Accuracy and Latency for Mobile Applications","authors":"Samuel S. Ogden, Tian Guo","doi":"10.1109/IC2E48712.2020.00010","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks are allowing mobile devices to incorporate a wide range of features into user applications. However, the computational complexity of these models makes it difficult to run them effectively on resource-constrained mobile devices. Prior work approached the problem of supporting deep learning in mobile applications by either decreasing model complexity or utilizing powerful cloud servers. These approaches each only focus on a single aspect of mobile inference and thus they often sacrifice overall performance.In this work we introduce a holistic approach to designing mobile deep inference frameworks. We first identify the key goals of accuracy and latency for mobile deep inference and the conditions that must be met to achieve them. We demonstrate our holistic approach through the design of a hypothetical framework called MDINFERENCE. This framework leverages two complementary techniques; a model selection algorithm that chooses from a set of cloud-based deep learning models to improve inference accuracy and an on-device request duplication mechanism to bound latency. Through empirically-driven simulations we show that MDINFERENCE improves aggregate accuracy over static approaches by over 40% without incurring SLA violations. Additionally, we show that with a target latency of 250ms, MDINFERENCE increased the aggregate accuracy in 99.74% cases on faster university networks and 96.84% cases on residential networks.","PeriodicalId":173494,"journal":{"name":"2020 IEEE International Conference on Cloud Engineering (IC2E)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Cloud Engineering (IC2E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E48712.2020.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Deep Neural Networks are allowing mobile devices to incorporate a wide range of features into user applications. However, the computational complexity of these models makes it difficult to run them effectively on resource-constrained mobile devices. Prior work approached the problem of supporting deep learning in mobile applications by either decreasing model complexity or utilizing powerful cloud servers. These approaches each only focus on a single aspect of mobile inference and thus they often sacrifice overall performance.In this work we introduce a holistic approach to designing mobile deep inference frameworks. We first identify the key goals of accuracy and latency for mobile deep inference and the conditions that must be met to achieve them. We demonstrate our holistic approach through the design of a hypothetical framework called MDINFERENCE. This framework leverages two complementary techniques; a model selection algorithm that chooses from a set of cloud-based deep learning models to improve inference accuracy and an on-device request duplication mechanism to bound latency. Through empirically-driven simulations we show that MDINFERENCE improves aggregate accuracy over static approaches by over 40% without incurring SLA violations. Additionally, we show that with a target latency of 250ms, MDINFERENCE increased the aggregate accuracy in 99.74% cases on faster university networks and 96.84% cases on residential networks.

查看原文本刊更多论文

MDINFERENCE:平衡移动应用程序的推理准确性和延迟

深度神经网络允许移动设备将广泛的功能整合到用户应用程序中。然而，这些模型的计算复杂性使得它们很难在资源受限的移动设备上有效地运行。先前的工作通过降低模型复杂性或利用强大的云服务器来解决在移动应用程序中支持深度学习的问题。这些方法都只关注移动推理的一个方面，因此它们往往会牺牲整体性能。在这项工作中，我们引入了一种整体的方法来设计移动深度推理框架。我们首先确定了移动深度推理的准确性和延迟的关键目标以及实现这些目标必须满足的条件。我们通过设计一个名为MDINFERENCE的假设框架来展示我们的整体方法。这个框架利用了两种互补的技术;一个模型选择算法，从一组基于云的深度学习模型中进行选择，以提高推理准确性;一个设备上的请求复制机制，以限制延迟。通过经验驱动的模拟，我们表明MDINFERENCE在不违反SLA的情况下，比静态方法提高了40%以上的聚合精度。此外，我们表明，在目标延迟为250ms的情况下，MDINFERENCE在更快的大学网络上提高了99.74%的总准确率，在住宅网络上提高了96.84%的总准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Conference on Cloud Engineering (IC2E)

自引率

0.00%

发文量