Avik Kar;Rahul Singh;Fang Liu;Xin Liu;Ness B. Shroff
{"title":"网络侧观察线性匪帮","authors":"Avik Kar;Rahul Singh;Fang Liu;Xin Liu;Ness B. Shroff","doi":"10.1109/TNET.2024.3422323","DOIUrl":null,"url":null,"abstract":"We investigate linear bandits in a network setting in the presence of side-observations across nodes in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends’ activity and, hence, provide information about each other’s preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe her response (e.g., an ad click) but also the side-observations, i.e., the response of her neighbors if they were presented with the same article. We model these observation dependencies by a graph \n<inline-formula> <tex-math>$\\mathcal {G}$ </tex-math></inline-formula>\n in which nodes correspond to users and edges to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization-based data-driven learning algorithm that utilizes the structure of \n<inline-formula> <tex-math>$\\mathcal {G}$ </tex-math></inline-formula>\n in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds \n<inline-formula> <tex-math>$T\\to \\infty $ </tex-math></inline-formula>\n. We show that this asymptotically optimal regret is upper-bounded as \n<inline-formula> <tex-math>$O\\left ({{|\\chi (\\mathcal {G})|\\log T}}\\right)$ </tex-math></inline-formula>\n, where \n<inline-formula> <tex-math>$|\\chi (\\mathcal {G})|$ </tex-math></inline-formula>\n is the domination number of \n<inline-formula> <tex-math>$\\mathcal {G}$ </tex-math></inline-formula>\n. In contrast, a naive application of the existing learning algorithms results in \n<inline-formula> <tex-math>$O\\left ({{N\\log T}}\\right)$ </tex-math></inline-formula>\n regret, where N is the number of users.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4222-4237"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Linear Bandits With Side Observations on Networks\",\"authors\":\"Avik Kar;Rahul Singh;Fang Liu;Xin Liu;Ness B. Shroff\",\"doi\":\"10.1109/TNET.2024.3422323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate linear bandits in a network setting in the presence of side-observations across nodes in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends’ activity and, hence, provide information about each other’s preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe her response (e.g., an ad click) but also the side-observations, i.e., the response of her neighbors if they were presented with the same article. We model these observation dependencies by a graph \\n<inline-formula> <tex-math>$\\\\mathcal {G}$ </tex-math></inline-formula>\\n in which nodes correspond to users and edges to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization-based data-driven learning algorithm that utilizes the structure of \\n<inline-formula> <tex-math>$\\\\mathcal {G}$ </tex-math></inline-formula>\\n in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds \\n<inline-formula> <tex-math>$T\\\\to \\\\infty $ </tex-math></inline-formula>\\n. We show that this asymptotically optimal regret is upper-bounded as \\n<inline-formula> <tex-math>$O\\\\left ({{|\\\\chi (\\\\mathcal {G})|\\\\log T}}\\\\right)$ </tex-math></inline-formula>\\n, where \\n<inline-formula> <tex-math>$|\\\\chi (\\\\mathcal {G})|$ </tex-math></inline-formula>\\n is the domination number of \\n<inline-formula> <tex-math>$\\\\mathcal {G}$ </tex-math></inline-formula>\\n. In contrast, a naive application of the existing learning algorithms results in \\n<inline-formula> <tex-math>$O\\\\left ({{N\\\\log T}}\\\\right)$ </tex-math></inline-formula>\\n regret, where N is the number of users.\",\"PeriodicalId\":13443,\"journal\":{\"name\":\"IEEE/ACM Transactions on Networking\",\"volume\":\"32 5\",\"pages\":\"4222-4237\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10589477/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10589477/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
We investigate linear bandits in a network setting in the presence of side-observations across nodes in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends’ activity and, hence, provide information about each other’s preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe her response (e.g., an ad click) but also the side-observations, i.e., the response of her neighbors if they were presented with the same article. We model these observation dependencies by a graph
$\mathcal {G}$
in which nodes correspond to users and edges to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization-based data-driven learning algorithm that utilizes the structure of
$\mathcal {G}$
in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds
$T\to \infty $
. We show that this asymptotically optimal regret is upper-bounded as
$O\left ({{|\chi (\mathcal {G})|\log T}}\right)$
, where
$|\chi (\mathcal {G})|$
is the domination number of
$\mathcal {G}$
. In contrast, a naive application of the existing learning algorithms results in
$O\left ({{N\log T}}\right)$
regret, where N is the number of users.
期刊介绍:
The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.