Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak
{"title":"优化夏普比率:多臂强盗中的风险调整决策","authors":"Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak","doi":"arxiv-2406.06552","DOIUrl":null,"url":null,"abstract":"Sharpe Ratio (SR) is a critical parameter in characterizing financial time\nseries as it jointly considers the reward and the volatility of any\nstock/portfolio through its variance. Deriving online algorithms for optimizing\nthe SR is particularly challenging since even offline policies experience\nconstant regret with respect to the best expert Even-Dar et al (2006). Thus,\ninstead of optimizing the usual definition of SR, we optimize regularized\nsquare SR (RSSR). We consider two settings for the RSSR, Regret Minimization\n(RM) and Best Arm Identification (BAI). In this regard, we propose a novel\nmulti-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR\nmaximization. We derive a path-dependent concentration bound for the estimate\nof the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and\nshow that it evolves as O(log n) for the two-armed bandit case played for a\nhorizon n. We also consider a fixed budget setting for well-known BAI\nalgorithms, i.e., sequential halving and successive rejects, and propose SHVV,\nSHSR, and SuRSR algorithms. We derive the upper bound for the error probability\nof all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the\nonly other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We\nalso establish its efficacy with respect to other benchmarks derived from the\nGRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed\nBAI algorithms for multiple different setups. Our research highlights that our\nproposed algorithms will find extensive applications in risk-aware portfolio\nmanagement problems. Consequently, our research highlights that our proposed\nalgorithms will find extensive applications in risk-aware portfolio management\nproblems.","PeriodicalId":501045,"journal":{"name":"arXiv - QuantFin - Portfolio Management","volume":"204 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits\",\"authors\":\"Sabrina Khurshid, Mohammed Shahid Abdulla, Gourab Ghatak\",\"doi\":\"arxiv-2406.06552\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sharpe Ratio (SR) is a critical parameter in characterizing financial time\\nseries as it jointly considers the reward and the volatility of any\\nstock/portfolio through its variance. Deriving online algorithms for optimizing\\nthe SR is particularly challenging since even offline policies experience\\nconstant regret with respect to the best expert Even-Dar et al (2006). Thus,\\ninstead of optimizing the usual definition of SR, we optimize regularized\\nsquare SR (RSSR). We consider two settings for the RSSR, Regret Minimization\\n(RM) and Best Arm Identification (BAI). In this regard, we propose a novel\\nmulti-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR\\nmaximization. We derive a path-dependent concentration bound for the estimate\\nof the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and\\nshow that it evolves as O(log n) for the two-armed bandit case played for a\\nhorizon n. We also consider a fixed budget setting for well-known BAI\\nalgorithms, i.e., sequential halving and successive rejects, and propose SHVV,\\nSHSR, and SuRSR algorithms. We derive the upper bound for the error probability\\nof all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the\\nonly other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We\\nalso establish its efficacy with respect to other benchmarks derived from the\\nGRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed\\nBAI algorithms for multiple different setups. Our research highlights that our\\nproposed algorithms will find extensive applications in risk-aware portfolio\\nmanagement problems. Consequently, our research highlights that our proposed\\nalgorithms will find extensive applications in risk-aware portfolio management\\nproblems.\",\"PeriodicalId\":501045,\"journal\":{\"name\":\"arXiv - QuantFin - Portfolio Management\",\"volume\":\"204 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Portfolio Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.06552\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Portfolio Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06552","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
夏普比率(SR)是表征金融时间序列的一个关键参数,因为它通过方差综合考虑了任何股票/投资组合的回报和波动性。为优化 SR 设计在线算法尤其具有挑战性,因为即使是离线策略,相对于最佳专家 Even-Dar 等人(2006 年)而言也会出现恒定的遗憾。因此,我们不优化通常定义的 SR,而是优化正则化平方 SR(RSSR)。我们考虑了两种 RSSR 设置,即遗憾最小化(RM)和最佳手臂识别(BAI)。为此,我们提出了一种新颖的多臂强盗(MAB)算法,称为 UCB-RSSR。我们推导出了估计 RSSR 的路径依赖浓度约束。在此基础上,我们推导出 UCB-RSSR 的遗憾保证,并证明它在双臂匪徒情况下的演化为 O(log n)。我们还考虑了众所周知的 BAI 算法(即连续减半和连续拒绝)的固定预算设置,并提出了 SHVV、SHSR 和 SuRSR 算法。我们推导出了所有建议 BAI 算法的错误概率上限。我们证明 UCB-RSSR 优于唯一已知的 SR 优化强盗算法 U-UCB Cassel et al (2023)。我们还确定了 UCB-RSSR 在源自 GRA-UCB 和 MVTS 算法的其他基准方面的功效。我们进一步证明了所提出的 BAI 算法在多种不同设置下的性能。我们的研究表明,我们提出的算法将在风险意识投资组合管理问题中得到广泛应用。因此,我们的研究强调,我们提出的算法将在风险意识投资组合管理问题中得到广泛应用。
Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits
Sharpe Ratio (SR) is a critical parameter in characterizing financial time
series as it jointly considers the reward and the volatility of any
stock/portfolio through its variance. Deriving online algorithms for optimizing
the SR is particularly challenging since even offline policies experience
constant regret with respect to the best expert Even-Dar et al (2006). Thus,
instead of optimizing the usual definition of SR, we optimize regularized
square SR (RSSR). We consider two settings for the RSSR, Regret Minimization
(RM) and Best Arm Identification (BAI). In this regard, we propose a novel
multi-armed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR
maximization. We derive a path-dependent concentration bound for the estimate
of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and
show that it evolves as O(log n) for the two-armed bandit case played for a
horizon n. We also consider a fixed budget setting for well-known BAI
algorithms, i.e., sequential halving and successive rejects, and propose SHVV,
SHSR, and SuRSR algorithms. We derive the upper bound for the error probability
of all proposed BAI algorithms. We demonstrate that UCB-RSSR outperforms the
only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023). We
also establish its efficacy with respect to other benchmarks derived from the
GRA-UCB and MVTS algorithms. We further demonstrate the performance of proposed
BAI algorithms for multiple different setups. Our research highlights that our
proposed algorithms will find extensive applications in risk-aware portfolio
management problems. Consequently, our research highlights that our proposed
algorithms will find extensive applications in risk-aware portfolio management
problems.