{"title":"统计顺序决策数学:随机匪帮中的集中、风险意识和建模,并应用于减肥手术","authors":"Patrick Saux","doi":"arxiv-2405.01994","DOIUrl":null,"url":null,"abstract":"This thesis aims to study some of the mathematical challenges that arise in\nthe analysis of statistical sequential decision-making algorithms for\npostoperative patients follow-up. Stochastic bandits (multiarmed, contextual)\nmodel the learning of a sequence of actions (policy) by an agent in an\nuncertain environment in order to maximise observed rewards. To learn optimal\npolicies, bandit algorithms have to balance the exploitation of current\nknowledge and the exploration of uncertain actions. Such algorithms have\nlargely been studied and deployed in industrial applications with large\ndatasets, low-risk decisions and clear modelling assumptions, such as\nclickthrough rate maximisation in online advertising. By contrast, digital\nhealth recommendations call for a whole new paradigm of small samples,\nrisk-averse agents and complex, nonparametric modelling. To this end, we\ndeveloped new safe, anytime-valid concentration bounds, (Bregman, empirical\nChernoff), introduced a new framework for risk-aware contextual bandits (with\nelicitable risk measures) and analysed a novel class of nonparametric bandit\nalgorithms under weak assumptions (Dirichlet sampling). In addition to the\ntheoretical guarantees, these results are supported by in-depth empirical\nevidence. Finally, as a first step towards personalised postoperative follow-up\nrecommendations, we developed with medical doctors and surgeons an\ninterpretable machine learning model to predict the long-term weight\ntrajectories of patients after bariatric surgery.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery\",\"authors\":\"Patrick Saux\",\"doi\":\"arxiv-2405.01994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This thesis aims to study some of the mathematical challenges that arise in\\nthe analysis of statistical sequential decision-making algorithms for\\npostoperative patients follow-up. Stochastic bandits (multiarmed, contextual)\\nmodel the learning of a sequence of actions (policy) by an agent in an\\nuncertain environment in order to maximise observed rewards. To learn optimal\\npolicies, bandit algorithms have to balance the exploitation of current\\nknowledge and the exploration of uncertain actions. Such algorithms have\\nlargely been studied and deployed in industrial applications with large\\ndatasets, low-risk decisions and clear modelling assumptions, such as\\nclickthrough rate maximisation in online advertising. By contrast, digital\\nhealth recommendations call for a whole new paradigm of small samples,\\nrisk-averse agents and complex, nonparametric modelling. To this end, we\\ndeveloped new safe, anytime-valid concentration bounds, (Bregman, empirical\\nChernoff), introduced a new framework for risk-aware contextual bandits (with\\nelicitable risk measures) and analysed a novel class of nonparametric bandit\\nalgorithms under weak assumptions (Dirichlet sampling). In addition to the\\ntheoretical guarantees, these results are supported by in-depth empirical\\nevidence. Finally, as a first step towards personalised postoperative follow-up\\nrecommendations, we developed with medical doctors and surgeons an\\ninterpretable machine learning model to predict the long-term weight\\ntrajectories of patients after bariatric surgery.\",\"PeriodicalId\":501330,\"journal\":{\"name\":\"arXiv - MATH - Statistics Theory\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.01994\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery
This thesis aims to study some of the mathematical challenges that arise in
the analysis of statistical sequential decision-making algorithms for
postoperative patients follow-up. Stochastic bandits (multiarmed, contextual)
model the learning of a sequence of actions (policy) by an agent in an
uncertain environment in order to maximise observed rewards. To learn optimal
policies, bandit algorithms have to balance the exploitation of current
knowledge and the exploration of uncertain actions. Such algorithms have
largely been studied and deployed in industrial applications with large
datasets, low-risk decisions and clear modelling assumptions, such as
clickthrough rate maximisation in online advertising. By contrast, digital
health recommendations call for a whole new paradigm of small samples,
risk-averse agents and complex, nonparametric modelling. To this end, we
developed new safe, anytime-valid concentration bounds, (Bregman, empirical
Chernoff), introduced a new framework for risk-aware contextual bandits (with
elicitable risk measures) and analysed a novel class of nonparametric bandit
algorithms under weak assumptions (Dirichlet sampling). In addition to the
theoretical guarantees, these results are supported by in-depth empirical
evidence. Finally, as a first step towards personalised postoperative follow-up
recommendations, we developed with medical doctors and surgeons an
interpretable machine learning model to predict the long-term weight
trajectories of patients after bariatric surgery.