{"title":"Optimal Guarantees for Online Selection Over Time","authors":"Sebastian Perez-Salazar, Victor Verdugo","doi":"arxiv-2408.11224","DOIUrl":null,"url":null,"abstract":"Prophet inequalities are a cornerstone in optimal stopping and online\ndecision-making. Traditionally, they involve the sequential observation of $n$\nnon-negative independent random variables and face irrevocable accept-or-reject\nchoices. The goal is to provide policies that provide a good approximation\nratio against the optimal offline solution that can access all the values\nupfront -- the so-called prophet value. In the prophet inequality over time\nproblem (POT), the decision-maker can commit to an accepted value for $\\tau$\nunits of time, during which no new values can be accepted. This creates a\ntrade-off between the duration of commitment and the opportunity to capture\npotentially higher future values. In this work, we provide best possible worst-case approximation ratios in the\nIID setting of POT for single-threshold algorithms and the optimal dynamic\nprogramming policy. We show a single-threshold algorithm that achieves an\napproximation ratio of $(1+e^{-2})/2\\approx 0.567$, and we prove that no\nsingle-threshold algorithm can surpass this guarantee. With our techniques, we\ncan analyze simple algorithms using $k$ thresholds and show that with $k=3$ it\nis possible to get an approximation ratio larger than $\\approx 0.602$. Then,\nfor each $n$, we prove it is possible to compute the tight worst-case\napproximation ratio of the optimal dynamic programming policy for instances\nwith $n$ values by solving a convex optimization program. A limit analysis of\nthe first-order optimality conditions yields a nonlinear differential equation\nshowing that the optimal dynamic programming policy's asymptotic worst-case\napproximation ratio is $\\approx 0.618$. Finally, we extend the discussion to\nadversarial settings and show an optimal worst-case approximation ratio of\n$\\approx 0.162$ when the values are streamed in random order.","PeriodicalId":501188,"journal":{"name":"arXiv - ECON - Theoretical Economics","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Theoretical Economics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Prophet inequalities are a cornerstone in optimal stopping and online
decision-making. Traditionally, they involve the sequential observation of $n$
non-negative independent random variables and face irrevocable accept-or-reject
choices. The goal is to provide policies that provide a good approximation
ratio against the optimal offline solution that can access all the values
upfront -- the so-called prophet value. In the prophet inequality over time
problem (POT), the decision-maker can commit to an accepted value for $\tau$
units of time, during which no new values can be accepted. This creates a
trade-off between the duration of commitment and the opportunity to capture
potentially higher future values. In this work, we provide best possible worst-case approximation ratios in the
IID setting of POT for single-threshold algorithms and the optimal dynamic
programming policy. We show a single-threshold algorithm that achieves an
approximation ratio of $(1+e^{-2})/2\approx 0.567$, and we prove that no
single-threshold algorithm can surpass this guarantee. With our techniques, we
can analyze simple algorithms using $k$ thresholds and show that with $k=3$ it
is possible to get an approximation ratio larger than $\approx 0.602$. Then,
for each $n$, we prove it is possible to compute the tight worst-case
approximation ratio of the optimal dynamic programming policy for instances
with $n$ values by solving a convex optimization program. A limit analysis of
the first-order optimality conditions yields a nonlinear differential equation
showing that the optimal dynamic programming policy's asymptotic worst-case
approximation ratio is $\approx 0.618$. Finally, we extend the discussion to
adversarial settings and show an optimal worst-case approximation ratio of
$\approx 0.162$ when the values are streamed in random order.