Distributed Multi-Task Learning for Stochastic Bandits With Context Distribution and Stage-Wise Constraints

IF 3 3区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Jiabin Lin;Shana Moothedath
{"title":"Distributed Multi-Task Learning for Stochastic Bandits With Context Distribution and Stage-Wise Constraints","authors":"Jiabin Lin;Shana Moothedath","doi":"10.1109/TSIPN.2025.3566239","DOIUrl":null,"url":null,"abstract":"We present conservative multi-task learning in stochastic linear contextual bandits with <italic>heterogeneous</i> agents. This extends conservative linear bandits to a distributed setting where <inline-formula><tex-math>$M$</tex-math></inline-formula> agents tackle <italic>different but related</i> tasks while adhering to stage-wise performance constraints. The exact context is <italic>unknown</i>, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For <inline-formula><tex-math>$d$</tex-math></inline-formula>-dimensional linear bandits, we prove an <inline-formula><tex-math>$\\widetilde{O}(d\\sqrt{MT})$</tex-math></inline-formula> regret bound and an <inline-formula><tex-math>$O(M^{1.5}d^{3})$</tex-math></inline-formula> communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"11 ","pages":"577-591"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981664/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

We present conservative multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where $M$ agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For $d$-dimensional linear bandits, we prove an $\widetilde{O}(d\sqrt{MT})$ regret bound and an $O(M^{1.5}d^{3})$ communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.
基于上下文分布和阶段约束的随机盗匪分布式多任务学习
我们提出了具有异质代理的随机线性上下文强盗中的保守多任务学习。这将保守的线性抢劫扩展到分布式设置,其中$M$代理处理不同但相关的任务,同时坚持阶段性性能约束。确切的上下文是未知的,在许多涉及预测机制来推断上下文的实际应用中,例如股票市场预测和天气预报,代理只能使用上下文分布。我们提出了一种分布式上置信度界(UCB)算法,DiSC-UCB。该算法为每轮任务动态构造一个精简的动作集,保证了约束的遵从性。此外,它还包括通过使用结构良好的同步步骤的中央服务器在代理之间同步共享估算值。对于$d$维线性强盗,我们证明了该算法的$\ widdetilde {O}(d\sqrt{MT})$遗憾界和$O(M^{1.5}d^{3})$通信界。我们将问题扩展到agent不知道基线奖励的情况。我们提出了一种改进的算法,DiSC-UCB-UB,并证明它达到了相同的遗憾和通信界限。我们在合成数据和真实世界的Movielens-100 K和LastFM数据上验证了算法的性能,并将其与一些现有的基准算法进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Signal and Information Processing over Networks
IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications
CiteScore
5.80
自引率
12.50%
发文量
56
期刊介绍: The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信