{"title":"两全其美:边缘分层推理算法","authors":"Vishnu Narayanan Moothedath;Jaya Prakash Champati;James Gross","doi":"10.1109/TMLCN.2024.3366501","DOIUrl":null,"url":null,"abstract":"We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the probability corresponding to the maximum probability class output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. For a full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, we propose the HIL-F algorithm and prove a sublinear regret bound \n<inline-formula> <tex-math>$\\sqrt {n\\ln (1/\\lambda _{\\text {min}})/2}$ </tex-math></inline-formula>\n without any assumption on the smoothness of the loss function, where \n<inline-formula> <tex-math>$n$ </tex-math></inline-formula>\n is the number of data samples and \n<inline-formula> <tex-math>$\\lambda _{\\text {min}}$ </tex-math></inline-formula>\n is the minimum difference between any two distinct maximum probability values across the data samples. For a no-local feedback scenario, where the ED does not receive the ground truth for the classification, we propose the HIL-N algorithm and prove that it has \n<inline-formula> <tex-math>$O\\left ({n^{2/{3}}\\ln ^{1/{3}}(1/\\lambda _{\\text {min}})}\\right)$ </tex-math></inline-formula>\n regret bound. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"280-297"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10436693","citationCount":"0","resultStr":"{\"title\":\"Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the Edge\",\"authors\":\"Vishnu Narayanan Moothedath;Jaya Prakash Champati;James Gross\",\"doi\":\"10.1109/TMLCN.2024.3366501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the probability corresponding to the maximum probability class output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. For a full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, we propose the HIL-F algorithm and prove a sublinear regret bound \\n<inline-formula> <tex-math>$\\\\sqrt {n\\\\ln (1/\\\\lambda _{\\\\text {min}})/2}$ </tex-math></inline-formula>\\n without any assumption on the smoothness of the loss function, where \\n<inline-formula> <tex-math>$n$ </tex-math></inline-formula>\\n is the number of data samples and \\n<inline-formula> <tex-math>$\\\\lambda _{\\\\text {min}}$ </tex-math></inline-formula>\\n is the minimum difference between any two distinct maximum probability values across the data samples. For a no-local feedback scenario, where the ED does not receive the ground truth for the classification, we propose the HIL-N algorithm and prove that it has \\n<inline-formula> <tex-math>$O\\\\left ({n^{2/{3}}\\\\ln ^{1/{3}}(1/\\\\lambda _{\\\\text {min}})}\\\\right)$ </tex-math></inline-formula>\\n regret bound. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10.\",\"PeriodicalId\":100641,\"journal\":{\"name\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"volume\":\"2 \",\"pages\":\"280-297\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10436693\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10436693/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10436693/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
我们考虑了一个资源受限的边缘设备(Edge Device,ED),如一个物联网传感器或微控制器单元,嵌入了一个用于通用分类应用的小型 ML 模型(S-ML)和一个承载大型 ML 模型(L-ML)的边缘服务器(Edge Server,ES)。由于 S-ML 的推理准确率低于 L-ML,因此将所有数据样本卸载到 ES 可以获得较高的推理准确率,但这有悖于在 ED 上嵌入 S-ML 的初衷,同时也失去了本地推理所带来的减少延迟、节省带宽和提高能效的优势。为了两全其美,即既能享受在 ED 上进行推理的好处,又能享受在 ES 上进行推理的好处,我们探索了分层推理(HI)的想法,即只有在 S-ML 推理正确时才接受它,否则数据样本将被卸载,用于 L-ML 推理。然而,HI 的理想实现并不可行,因为 ED 并不知道 S-ML 推论的正确性。因此,我们提出了一个在线元学习框架,ED 可以利用它来预测 S-ML 推论的正确性。特别是,我们建议使用 S-ML 对数据样本输出的最大概率类对应的概率,并决定是否卸载它。由此产生的在线学习问题变成了一个具有连续专家空间的专家建议预测(PEA)问题。在完全反馈的情况下,ED 一旦接受推理,就会收到关于 S-ML 正确性的反馈,我们提出了 HIL-F 算法,并证明了一个亚线性遗憾约束 $/sqrt {n\lambda _{text {min})/2}$ 而无需假设损失函数的平滑性、其中,$n$ 是数据样本的数量,$\lambda _{\text {min}}$ 是数据样本中任意两个不同最大概率值之间的最小差值。在无本地反馈的情况下,即 ED 没有收到分类的基本事实,我们提出了 HIL-N 算法,并证明该算法具有 $O\left ({n^{2/{3}}\ln ^{1/{3}}(1/\lambda _\text {min}})}\right)$ 的遗憾约束。我们使用四个数据集,即Imagenette和Imagewoof、MNIST和CIFAR-10,对所提算法在图像分类应用中的性能进行了评估和基准测试。
Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the Edge
We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the probability corresponding to the maximum probability class output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. For a full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, we propose the HIL-F algorithm and prove a sublinear regret bound
$\sqrt {n\ln (1/\lambda _{\text {min}})/2}$
without any assumption on the smoothness of the loss function, where
$n$
is the number of data samples and
$\lambda _{\text {min}}$
is the minimum difference between any two distinct maximum probability values across the data samples. For a no-local feedback scenario, where the ED does not receive the ground truth for the classification, we propose the HIL-N algorithm and prove that it has
$O\left ({n^{2/{3}}\ln ^{1/{3}}(1/\lambda _{\text {min}})}\right)$
regret bound. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10.