{"title":"Uncertainty-Aware Online Time Series Multi-Step Forecasting Framework in Cloud Systems","authors":"Jiadong Chen;Yang Luo;Xiuqi Huang;Fuxin Jiang;Yangguang Shi;Tieying Zhang;Xiaofeng Gao","doi":"10.1109/TKDE.2026.3674583","DOIUrl":null,"url":null,"abstract":"Accurate resource planning in large-scale systems relies on reliable predictions of future workloads, a task inherently challenged by their variability and dynamism. Previous prediction methods are either ineffective to deal with the changing dynamics of the series, or are highly black-boxed and unable to conduct effective theoretical analysis. To address these issues, we design an effective ensemble framework, Interval Prediction with Online Chasing (<b>IPOC</b>), tailored for multi-step interval forecasting in real-time systems. Theoretically, by formulating the task as a Dynamic Deterministic Markov Decision Process (Dd-MDP), an advanced theoretical framework is introduced to analyze problem solvability and derive conditions for the existence of feasible solutions. Incorporating the proposed Adaptive Copula Conformal Inference (ACCI) module and a well-designed Chasing Oracle, <b>IPOC</b> captures the changing dynamics and temporal dependencies to enable multi-step forecasting. We organically integrate advanced online learning theories with time series forecasting tasks to construct a forecasting framework that is both theoretically rigorous and practically effective. Theoretical analysis underpins <b>IPOC</b>’s effectiveness, demonstrating sublinear regret and adherence to confidence interval specifications. The chasing regret of the Chasing Oracle is <inline-formula><tex-math>$O(L_{c})$</tex-math></inline-formula>, and the overall regret of <b>IPOC</b> is <inline-formula><tex-math>$O(\\sqrt{L_{c} T \\log |\\mathcal {F}|})$</tex-math></inline-formula>. Empirically, <b>IPOC</b> is validated through extensive experiments on five real-world datasets, including public datasets and different types of workload collected from Bytedance Cloud, with comparisons to 25 baselines and 4 forecasting horizons (1/5/10/30). Specifically, <b>IPOC</b> achieves an average reduction of over 20% in RMSE/MAE/SMAPE/<inline-formula><tex-math>$\\rho$</tex-math></inline-formula>-risk compared to baselines across five datasets. Besides, we apply our model to a case study on predictive auto-scaling tasks in actual large-scale cloud systems to validate its utility.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 5","pages":"3277-3290"},"PeriodicalIF":10.4000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11435627/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate resource planning in large-scale systems relies on reliable predictions of future workloads, a task inherently challenged by their variability and dynamism. Previous prediction methods are either ineffective to deal with the changing dynamics of the series, or are highly black-boxed and unable to conduct effective theoretical analysis. To address these issues, we design an effective ensemble framework, Interval Prediction with Online Chasing (IPOC), tailored for multi-step interval forecasting in real-time systems. Theoretically, by formulating the task as a Dynamic Deterministic Markov Decision Process (Dd-MDP), an advanced theoretical framework is introduced to analyze problem solvability and derive conditions for the existence of feasible solutions. Incorporating the proposed Adaptive Copula Conformal Inference (ACCI) module and a well-designed Chasing Oracle, IPOC captures the changing dynamics and temporal dependencies to enable multi-step forecasting. We organically integrate advanced online learning theories with time series forecasting tasks to construct a forecasting framework that is both theoretically rigorous and practically effective. Theoretical analysis underpins IPOC’s effectiveness, demonstrating sublinear regret and adherence to confidence interval specifications. The chasing regret of the Chasing Oracle is $O(L_{c})$, and the overall regret of IPOC is $O(\sqrt{L_{c} T \log |\mathcal {F}|})$. Empirically, IPOC is validated through extensive experiments on five real-world datasets, including public datasets and different types of workload collected from Bytedance Cloud, with comparisons to 25 baselines and 4 forecasting horizons (1/5/10/30). Specifically, IPOC achieves an average reduction of over 20% in RMSE/MAE/SMAPE/$\rho$-risk compared to baselines across five datasets. Besides, we apply our model to a case study on predictive auto-scaling tasks in actual large-scale cloud systems to validate its utility.
大规模系统中准确的资源规划依赖于对未来工作负载的可靠预测,这是一项受到其可变性和动态性固有挑战的任务。以往的预测方法要么无法处理序列的动态变化,要么黑箱化程度高,无法进行有效的理论分析。为了解决这些问题,我们设计了一个有效的集成框架,区间预测与在线追踪(IPOC),为实时系统中的多步区间预测量身定制。从理论上讲,通过将任务表述为动态确定性马尔可夫决策过程(Dd-MDP),引入了一个先进的理论框架来分析问题的可解性,并推导出可行解存在的条件。结合提出的自适应Copula共形推理(ACCI)模块和精心设计的Chasing Oracle, IPOC捕获了不断变化的动态和时间依赖性,从而实现了多步预测。我们将先进的在线学习理论与时间序列预测任务有机地结合起来,构建了一个理论严谨、实践有效的预测框架。理论分析支持IPOC的有效性,展示了次线性后悔和对置信区间规范的遵守。追逐Oracle的追逐遗憾为$O(L_{c})$, IPOC的整体遗憾为$O(\sqrt{L_{c} T \log |\mathcal {F}|})$。在经验上,IPOC通过在五个真实数据集(包括公共数据集和从Bytedance Cloud收集的不同类型的工作负载)上的广泛实验进行了验证,并与25个基线和4个预测范围(1/5/10/30)进行了比较。具体来说,IPOC实现了平均减少20以上% in RMSE/MAE/SMAPE/$\rho$-risk compared to baselines across five datasets. Besides, we apply our model to a case study on predictive auto-scaling tasks in actual large-scale cloud systems to validate its utility.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.