Objective-Driven Differentiable Optimization of Traffic Prediction and Resource Allocation for Split AI Inference Edge Networks

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2024-08-26 DOI:10.1109/TMLCN.2024.3449831

Xinchen Lyu;Yuewei Li;Ying He;Chenshan Ren;Wei Ni;Ren Ping Liu;Pengcheng Zhu;Qimei Cui

{"title":"Objective-Driven Differentiable Optimization of Traffic Prediction and Resource Allocation for Split AI Inference Edge Networks","authors":"Xinchen Lyu;Yuewei Li;Ying He;Chenshan Ren;Wei Ni;Ren Ping Liu;Pengcheng Zhu;Qimei Cui","doi":"10.1109/TMLCN.2024.3449831","DOIUrl":null,"url":null,"abstract":"Split AI inference partitions an artificial intelligence (AI) model into multiple parts, enabling the offloading of computation-intensive AI services. Resource allocation is critical for the performance of split AI inference. The challenge arises from the time-sensitivity of many services versus time-varying traffic arrivals and network conditions. The conventional prediction-based resource allocation frameworks have adopted separate traffic prediction and resource optimization modules, which may be inefficient due to discrepancies between the traffic prediction accuracy and resource optimization objective. This paper proposes a new, objective-driven, differentiable optimization framework that integrates traffic prediction and resource allocation for split AI inference. The resource optimization problem (aimed to maximize network revenue while adhering to service and network constraints) is designed to be embedded as the output layer following the traffic prediction module. As such, the traffic prediction module can be trained directly based on the network revenue instead of the prediction accuracy, significantly outperforming the conventional prediction-based separate design. Employing the Lagrange duality and Karush-Kuhn-Tucker (KKT) conditions, we achieve efficient forward pass (obtaining resource allocation decisions) and backpropagation (deriving the objective-driven gradients for joint model training) of the output layer. Extensive experiments on different traffic datasets validate the superiority of the proposed approach, achieving up to 38.85% higher network revenue than the conventional predictive baselines.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"1178-1192"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10646623","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10646623/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Split AI inference partitions an artificial intelligence (AI) model into multiple parts, enabling the offloading of computation-intensive AI services. Resource allocation is critical for the performance of split AI inference. The challenge arises from the time-sensitivity of many services versus time-varying traffic arrivals and network conditions. The conventional prediction-based resource allocation frameworks have adopted separate traffic prediction and resource optimization modules, which may be inefficient due to discrepancies between the traffic prediction accuracy and resource optimization objective. This paper proposes a new, objective-driven, differentiable optimization framework that integrates traffic prediction and resource allocation for split AI inference. The resource optimization problem (aimed to maximize network revenue while adhering to service and network constraints) is designed to be embedded as the output layer following the traffic prediction module. As such, the traffic prediction module can be trained directly based on the network revenue instead of the prediction accuracy, significantly outperforming the conventional prediction-based separate design. Employing the Lagrange duality and Karush-Kuhn-Tucker (KKT) conditions, we achieve efficient forward pass (obtaining resource allocation decisions) and backpropagation (deriving the objective-driven gradients for joint model training) of the output layer. Extensive experiments on different traffic datasets validate the superiority of the proposed approach, achieving up to 38.85% higher network revenue than the conventional predictive baselines.

查看原文本刊更多论文

目标驱动的分体式人工智能推理边缘网络流量预测和资源分配差异化优化

拆分式人工智能推理将人工智能（AI）模型分割成多个部分，从而实现了计算密集型人工智能服务的卸载。资源分配对拆分式人工智能推理的性能至关重要。许多服务对时变流量到达和网络条件具有时间敏感性，这就带来了挑战。传统的基于预测的资源分配框架采用独立的流量预测和资源优化模块，由于流量预测精度和资源优化目标之间存在差异，可能会导致效率低下。本文提出了一种新的、目标驱动的、可微分的优化框架，该框架整合了流量预测和资源分配，适用于拆分式人工智能推理。资源优化问题（旨在最大化网络收益，同时遵守服务和网络约束）被设计为流量预测模块之后的输出层。因此，流量预测模块可以直接根据网络收益而不是预测准确性进行训练，大大优于传统的基于预测的单独设计。利用拉格朗日对偶性和卡鲁什-库恩-塔克（KKT）条件，我们实现了输出层的高效前传（获得资源分配决策）和反向传播（得出联合模型训练的目标驱动梯度）。在不同流量数据集上进行的广泛实验验证了所提方法的优越性，与传统预测基线相比，网络收益最高可提高 38.85%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量