通过动态边缘网络提供具有端到端延迟 SLO 的推理服务

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems Pub Date : 2024-02-06 DOI:10.1007/s11241-024-09418-4

Vinod Nigade, Pablo Bauszat, Henri Bal, Lin Wang

{"title":"通过动态边缘网络提供具有端到端延迟 SLO 的推理服务","authors":"Vinod Nigade, Pablo Bauszat, Henri Bal, Lin Wang","doi":"10.1007/s11241-024-09418-4","DOIUrl":null,"url":null,"abstract":"While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish—a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.\n","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"26 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference serving with end-to-end latency SLOs over dynamic edge networks\",\"authors\":\"Vinod Nigade, Pablo Bauszat, Henri Bal, Lin Wang\",\"doi\":\"10.1007/s11241-024-09418-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish—a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.\\n\",\"PeriodicalId\":54507,\"journal\":{\"name\":\"Real-Time Systems\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Real-Time Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11241-024-09418-4\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11241-024-09418-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

虽然高精度对于深度学习（DL）推理至关重要，但按时满足推理请求也同样重要，但人们尚未对此进行仔细研究，尤其是当请求必须通过边缘的动态无线网络提供服务时。在本文中，我们提出了 Jellyfish--一种新型边缘深度学习推理服务系统，它能实现端到端推理延迟服务级目标（SLO）的软保证。Jellyfish 利用数据和深度神经网络（DNN）自适应来处理网络的可变性，从而在准确性和延迟之间进行权衡。Jellyfish 采用了一种新的设计，实现了集体适应策略，在这种策略中，数据和 DNN 适应的决策在网络条件不同的多个用户之间进行协调和统一。我们提出了在运行时持续映射用户和适应 DNN 的高效算法，从而在实现延迟 SLO 的同时，最大限度地提高整体推理精度。我们进一步研究了动态 DNN，即包含多种架构变体的 DNN，并通过初步实验证明了它们的潜在优势。我们基于原型实现和真实世界的 WiFi 和 LTE 网络跟踪进行的实验表明，Jellyfish 可以在保持高准确率的同时，以约 99% 的速度满足延迟 SLO。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Inference serving with end-to-end latency SLOs over dynamic edge networks

查看原文本刊更多论文

Inference serving with end-to-end latency SLOs over dynamic edge networks

While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish—a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Real-Time Systems 工程技术-计算机：理论方法

CiteScore

2.90

自引率

7.70%

发文量

审稿时长

6 months

期刊介绍： Papers published in Real-Time Systems cover, among others, the following topics: requirements engineering, specification and verification techniques, design methods and tools, programming languages, operating systems, scheduling algorithms, architecture, hardware and interfacing, dependability and safety, distributed and other novel architectures, wired and wireless communications, wireless sensor systems, distributed databases, artificial intelligence techniques, expert systems, and application case studies. Applications are found in command and control systems, process control, automated manufacturing, flight control, avionics, space avionics and defense systems, shipborne systems, vision and robotics, pervasive and ubiquitous computing, and in an abundance of embedded systems.