{"title":"最优序贯随机最短路径阻断","authors":"Juan S. Borrero, Denis Sauré, Natalia Trigo","doi":"10.1016/j.ejor.2025.04.009","DOIUrl":null,"url":null,"abstract":"We consider the periodic interaction between a leader and a follower in the context of network interdiction where, in each period, the leader first blocks (momentarily) passage through a subset of arcs in a network, and then the follower traverses the shortest path in the interdicted network. We assume that arc costs are stochastic and that while their underlying distribution is known to the follower, it is not known by the leader. We cast the problem of the leader, who aims at maximizing the cumulative cost incurred by the evader, using the multi-armed bandit framework. Such a setting differs from the traditional bandit in that the feedback elicited by playing an arm is the reaction of an adversarial agent. After developing a fundamental limit in the achievable performance by any admissible policy, we adapt traditional policies developed for linear bandits to our setting. We show that a critical step in such an adaptation is to ensure that the cost vectors imputed by these algorithms lie within a polyhedron characterizing information that can be collected without noise and in finite time. Within such a polyhedron, the problem can be mapped into a linear bandit. The polyhedron has exponentially many constraints in the worst case, which are indirectly tackled by solving several mathematical programs. We test the proposed policies and relevant benchmarks through a set of numerical experiments. Our results show that the adapted policies can significantly outperform the performance of the base policies at the price of increasing their computational complexity.","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"60 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal sequential stochastic shortest path interdiction\",\"authors\":\"Juan S. Borrero, Denis Sauré, Natalia Trigo\",\"doi\":\"10.1016/j.ejor.2025.04.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the periodic interaction between a leader and a follower in the context of network interdiction where, in each period, the leader first blocks (momentarily) passage through a subset of arcs in a network, and then the follower traverses the shortest path in the interdicted network. We assume that arc costs are stochastic and that while their underlying distribution is known to the follower, it is not known by the leader. We cast the problem of the leader, who aims at maximizing the cumulative cost incurred by the evader, using the multi-armed bandit framework. Such a setting differs from the traditional bandit in that the feedback elicited by playing an arm is the reaction of an adversarial agent. After developing a fundamental limit in the achievable performance by any admissible policy, we adapt traditional policies developed for linear bandits to our setting. We show that a critical step in such an adaptation is to ensure that the cost vectors imputed by these algorithms lie within a polyhedron characterizing information that can be collected without noise and in finite time. Within such a polyhedron, the problem can be mapped into a linear bandit. The polyhedron has exponentially many constraints in the worst case, which are indirectly tackled by solving several mathematical programs. We test the proposed policies and relevant benchmarks through a set of numerical experiments. Our results show that the adapted policies can significantly outperform the performance of the base policies at the price of increasing their computational complexity.\",\"PeriodicalId\":55161,\"journal\":{\"name\":\"European Journal of Operational Research\",\"volume\":\"60 1\",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Operational Research\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ejor.2025.04.009\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1016/j.ejor.2025.04.009","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
We consider the periodic interaction between a leader and a follower in the context of network interdiction where, in each period, the leader first blocks (momentarily) passage through a subset of arcs in a network, and then the follower traverses the shortest path in the interdicted network. We assume that arc costs are stochastic and that while their underlying distribution is known to the follower, it is not known by the leader. We cast the problem of the leader, who aims at maximizing the cumulative cost incurred by the evader, using the multi-armed bandit framework. Such a setting differs from the traditional bandit in that the feedback elicited by playing an arm is the reaction of an adversarial agent. After developing a fundamental limit in the achievable performance by any admissible policy, we adapt traditional policies developed for linear bandits to our setting. We show that a critical step in such an adaptation is to ensure that the cost vectors imputed by these algorithms lie within a polyhedron characterizing information that can be collected without noise and in finite time. Within such a polyhedron, the problem can be mapped into a linear bandit. The polyhedron has exponentially many constraints in the worst case, which are indirectly tackled by solving several mathematical programs. We test the proposed policies and relevant benchmarks through a set of numerical experiments. Our results show that the adapted policies can significantly outperform the performance of the base policies at the price of increasing their computational complexity.
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.