{"title":"基于上下文分布和阶段约束的随机盗匪分布式多任务学习","authors":"Jiabin Lin;Shana Moothedath","doi":"10.1109/TSIPN.2025.3566239","DOIUrl":null,"url":null,"abstract":"We present conservative multi-task learning in stochastic linear contextual bandits with <italic>heterogeneous</i> agents. This extends conservative linear bandits to a distributed setting where <inline-formula><tex-math>$M$</tex-math></inline-formula> agents tackle <italic>different but related</i> tasks while adhering to stage-wise performance constraints. The exact context is <italic>unknown</i>, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For <inline-formula><tex-math>$d$</tex-math></inline-formula>-dimensional linear bandits, we prove an <inline-formula><tex-math>$\\widetilde{O}(d\\sqrt{MT})$</tex-math></inline-formula> regret bound and an <inline-formula><tex-math>$O(M^{1.5}d^{3})$</tex-math></inline-formula> communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"11 ","pages":"577-591"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed Multi-Task Learning for Stochastic Bandits With Context Distribution and Stage-Wise Constraints\",\"authors\":\"Jiabin Lin;Shana Moothedath\",\"doi\":\"10.1109/TSIPN.2025.3566239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present conservative multi-task learning in stochastic linear contextual bandits with <italic>heterogeneous</i> agents. This extends conservative linear bandits to a distributed setting where <inline-formula><tex-math>$M$</tex-math></inline-formula> agents tackle <italic>different but related</i> tasks while adhering to stage-wise performance constraints. The exact context is <italic>unknown</i>, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For <inline-formula><tex-math>$d$</tex-math></inline-formula>-dimensional linear bandits, we prove an <inline-formula><tex-math>$\\\\widetilde{O}(d\\\\sqrt{MT})$</tex-math></inline-formula> regret bound and an <inline-formula><tex-math>$O(M^{1.5}d^{3})$</tex-math></inline-formula> communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.\",\"PeriodicalId\":56268,\"journal\":{\"name\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"volume\":\"11 \",\"pages\":\"577-591\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10981664/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981664/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Distributed Multi-Task Learning for Stochastic Bandits With Context Distribution and Stage-Wise Constraints
We present conservative multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where $M$ agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm dynamically constructs a pruned action set for each task in every round, guaranteeing compliance with the constraints. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. For $d$-dimensional linear bandits, we prove an $\widetilde{O}(d\sqrt{MT})$ regret bound and an $O(M^{1.5}d^{3})$ communication bound on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. We provide a modified algorithm, DiSC-UCB-UB, and show that it achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100 K and LastFM data and also compared it with some existing benchmark algorithms.
期刊介绍:
The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.