{"title":"序贯双层线性规划的学习","authors":"J. S. Borrero, O. Prokopyev, Denis Sauré","doi":"10.1287/ijoo.2021.0063","DOIUrl":null,"url":null,"abstract":"We consider a framework for sequential bilevel linear programming in which a leader and a follower interact over multiple time periods. In each period, the follower observes the actions taken by the leader and reacts optimally, according to the follower’s own objective function, which is initially unknown to the leader. By observing various forms of information feedback from the follower’s actions, the leader is able to refine the leader’s knowledge about the follower’s objective function and, hence, adjust the leader’s actions at subsequent time periods, which ought to help in maximizing the leader’s cumulative benefit. We show that greedy and robust policies adapted from previous work in the max-min (symmetric) setting might fail to recover the optimal full-information solution to the problem (i.e., a solution implemented by an oracle with complete prior knowledge of the follower’s objective function) in the asymmetric case. In contrast, we present a family of greedy and best-case policies that are able to recover the full-information optimal solution and also provide real-time certificates of optimality. In addition, we show that the proposed policies can be computed by solving a series of linear mixed-integer programs. We test policy performance through exhaustive numerical experiments in the context of asymmetric shortest path interdiction, considering various forms of feedback and several benchmark policies.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Learning in Sequential Bilevel Linear Programming\",\"authors\":\"J. S. Borrero, O. Prokopyev, Denis Sauré\",\"doi\":\"10.1287/ijoo.2021.0063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a framework for sequential bilevel linear programming in which a leader and a follower interact over multiple time periods. In each period, the follower observes the actions taken by the leader and reacts optimally, according to the follower’s own objective function, which is initially unknown to the leader. By observing various forms of information feedback from the follower’s actions, the leader is able to refine the leader’s knowledge about the follower’s objective function and, hence, adjust the leader’s actions at subsequent time periods, which ought to help in maximizing the leader’s cumulative benefit. We show that greedy and robust policies adapted from previous work in the max-min (symmetric) setting might fail to recover the optimal full-information solution to the problem (i.e., a solution implemented by an oracle with complete prior knowledge of the follower’s objective function) in the asymmetric case. In contrast, we present a family of greedy and best-case policies that are able to recover the full-information optimal solution and also provide real-time certificates of optimality. In addition, we show that the proposed policies can be computed by solving a series of linear mixed-integer programs. We test policy performance through exhaustive numerical experiments in the context of asymmetric shortest path interdiction, considering various forms of feedback and several benchmark policies.\",\"PeriodicalId\":73382,\"journal\":{\"name\":\"INFORMS journal on optimization\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INFORMS journal on optimization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/ijoo.2021.0063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INFORMS journal on optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/ijoo.2021.0063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We consider a framework for sequential bilevel linear programming in which a leader and a follower interact over multiple time periods. In each period, the follower observes the actions taken by the leader and reacts optimally, according to the follower’s own objective function, which is initially unknown to the leader. By observing various forms of information feedback from the follower’s actions, the leader is able to refine the leader’s knowledge about the follower’s objective function and, hence, adjust the leader’s actions at subsequent time periods, which ought to help in maximizing the leader’s cumulative benefit. We show that greedy and robust policies adapted from previous work in the max-min (symmetric) setting might fail to recover the optimal full-information solution to the problem (i.e., a solution implemented by an oracle with complete prior knowledge of the follower’s objective function) in the asymmetric case. In contrast, we present a family of greedy and best-case policies that are able to recover the full-information optimal solution and also provide real-time certificates of optimality. In addition, we show that the proposed policies can be computed by solving a series of linear mixed-integer programs. We test policy performance through exhaustive numerical experiments in the context of asymmetric shortest path interdiction, considering various forms of feedback and several benchmark policies.