Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak
{"title":"Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits","authors":"Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak","doi":"arxiv-2406.06552","DOIUrl":null,"url":null,"abstract":"Sharpe Ratio (SR) is a critical parameter in characterizing financial time\nseries as it jointly considers the reward and the volatility of any\nstock/portfolio through its variance. Deriving online algorithms for optimizing\nthe SR is particularly challenging since even offline policies experience\nconstant regret with respect to the best expert Even-Dar et al (2006). Thus,\ninstead of optimizing the usual definition of SR, we optimize regularized\nsquare SR (RSSR). We consider two settings for the RSSR, Regret Minimization\n(RM) and Best Arm Identification (BAI). In this regard, we propose a novel\nmulti-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR\nmaximization. We derive a path-dependent concentration bound for the estimate\nof the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and\nshow that it evolves as O(log n) for the two-armed bandit case played for a\nhorizon n. We also consider a fixed budget setting for well-known BAI\nalgorithms, i.e., sequential halving and successive rejects, and propose SHVV,\nSHSR, and SuRSR algorithms. We derive the upper bound for the error probability\nof all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the\nonly other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We\nalso establish its efficacy with respect to other benchmarks derived from the\nGRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed\nBAI algorithms for multiple different setups. Our research highlights that our\nproposed algorithms will find extensive applications in risk-aware portfolio\nmanagement problems. Consequently, our research highlights that our proposed\nalgorithms will find extensive applications in risk-aware portfolio management\nproblems.","PeriodicalId":501045,"journal":{"name":"arXiv - QuantFin - Portfolio Management","volume":"204 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Portfolio Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06552","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sharpe Ratio (SR) is a critical parameter in characterizing financial time
series as it jointly considers the reward and the volatility of any
stock/portfolio through its variance. Deriving online algorithms for optimizing
the SR is particularly challenging since even offline policies experience
constant regret with respect to the best expert Even-Dar et al (2006). Thus,
instead of optimizing the usual definition of SR, we optimize regularized
square SR (RSSR). We consider two settings for the RSSR, Regret Minimization
(RM) and Best Arm Identification (BAI). In this regard, we propose a novel
multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR
maximization. We derive a path-dependent concentration bound for the estimate
of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and
show that it evolves as O(log n) for the two-armed bandit case played for a
horizon n. We also consider a fixed budget setting for well-known BAI
algorithms, i.e., sequential halving and successive rejects, and propose SHVV,
SHSR, and SuRSR algorithms. We derive the upper bound for the error probability
of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the
only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We
also establish its efficacy with respect to other benchmarks derived from the
GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed
BAI algorithms for multiple different setups. Our research highlights that our
proposed algorithms will find extensive applications in risk-aware portfolio
management problems. Consequently, our research highlights that our proposed
algorithms will find extensive applications in risk-aware portfolio management
problems.