Distributed Multi-Task Learning for Stochastic Bandits With Context Distribution and Stage-Wise Constraints

IF 3 3区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal and Information Processing over Networks Pub Date : 2025-03-01 DOI:10.1109/TSIPN.2025.3566239

Jiabin Lin;Shana Moothedath

{"title":"Distributed Multi-Task Learning for Stochastic Bandits With Context Distribution and Stage-Wise Constraints","authors":"Jiabin Lin;Shana Moothedath","doi":"10.1109/TSIPN.2025.3566239","DOIUrl":null,"url":null,"abstract":"We present conservative multi-task learning in stochastic linear contextual bandits with <italic>heterogeneous</i> agents. This extends conservative linear bandits to a distributed setting where <inline-formula><tex-math>$M$</tex-math></inline-formula> agents tackle <italic>different but related</i> tasks while adhering to stage-wise performance constraints. The exact context is <italic>unknown</i>, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For <inline-formula><tex-math>$d$</tex-math></inline-formula>-dimensional linear bandits, we prove an <inline-formula><tex-math>$\\widetilde{O}(d\\sqrt{MT})$</tex-math></inline-formula> regret bound and an <inline-formula><tex-math>$O(M^{1.5}d^{3})$</tex-math></inline-formula> communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"11 ","pages":"577-591"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981664/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

We present conservative multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where

$M$

agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For

$d$

-dimensional linear bandits, we prove an

$\widetilde{O}(d\sqrt{MT})$

regret bound and an

$O(M^{1.5}d^{3})$

communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.

查看原文本刊更多论文

基于上下文分布和阶段约束的随机盗匪分布式多任务学习

我们提出了具有异质代理的随机线性上下文强盗中的保守多任务学习。这将保守的线性抢劫扩展到分布式设置，其中$M$代理处理不同但相关的任务，同时坚持阶段性性能约束。确切的上下文是未知的，在许多涉及预测机制来推断上下文的实际应用中，例如股票市场预测和天气预报，代理只能使用上下文分布。我们提出了一种分布式上置信度界（UCB）算法，DiSC-UCB。该算法为每轮任务动态构造一个精简的动作集，保证了约束的遵从性。此外，它还包括通过使用结构良好的同步步骤的中央服务器在代理之间同步共享估算值。对于$d$维线性强盗，我们证明了该算法的$\ widdetilde {O}(d\sqrt{MT})$遗憾界和$O(M^{1.5}d^{3})$通信界。我们将问题扩展到agent不知道基线奖励的情况。我们提出了一种改进的算法，DiSC-UCB-UB，并证明它达到了相同的遗憾和通信界限。我们在合成数据和真实世界的Movielens-100 K和LastFM数据上验证了算法的性能，并将其与一些现有的基准算法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications

CiteScore

5.80

自引率

12.50%

发文量

期刊介绍： The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.