{"title":"The Shortest Path Problem in the Bandit Setting","authors":"A. György, T. Linder, G. Lugosi","doi":"10.1109/ITW.2006.1633787","DOIUrl":null,"url":null,"abstract":"The on-line shortest path problem is considered in the bandit setting. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary way, a decision maker has to pick in each round a path between two distinguished vertices, such that the weight of this path, given as the sum of the weights of its composing edges, be as small as possible. The decision maker has only limited information on how the weights of the edges are generated. In particular, the edge weights in the current round are unknown to the decision maker when it chooses a path, and after choosing a path, it learns only the weights of those edges that belong to the chosen path. An algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1/√n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1/√n).","PeriodicalId":293144,"journal":{"name":"2006 IEEE Information Theory Workshop - ITW '06 Punta del Este","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE Information Theory Workshop - ITW '06 Punta del Este","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW.2006.1633787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The on-line shortest path problem is considered in the bandit setting. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary way, a decision maker has to pick in each round a path between two distinguished vertices, such that the weight of this path, given as the sum of the weights of its composing edges, be as small as possible. The decision maker has only limited information on how the weights of the edges are generated. In particular, the edge weights in the current round are unknown to the decision maker when it chooses a path, and after choosing a path, it learns only the weights of those edges that belong to the chosen path. An algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1/√n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier algorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1/√n).