Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery

Patrick Saux
{"title":"Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery","authors":"Patrick Saux","doi":"arxiv-2405.01994","DOIUrl":null,"url":null,"abstract":"This thesis aims to study some of the mathematical challenges that arise in\nthe analysis of statistical sequential decision-making algorithms for\npostoperative patients follow-up. Stochastic bandits (multiarmed, contextual)\nmodel the learning of a sequence of actions (policy) by an agent in an\nuncertain environment in order to maximise observed rewards. To learn optimal\npolicies, bandit algorithms have to balance the exploitation of current\nknowledge and the exploration of uncertain actions. Such algorithms have\nlargely been studied and deployed in industrial applications with large\ndatasets, low-risk decisions and clear modelling assumptions, such as\nclickthrough rate maximisation in online advertising. By contrast, digital\nhealth recommendations call for a whole new paradigm of small samples,\nrisk-averse agents and complex, nonparametric modelling. To this end, we\ndeveloped new safe, anytime-valid concentration bounds, (Bregman, empirical\nChernoff), introduced a new framework for risk-aware contextual bandits (with\nelicitable risk measures) and analysed a novel class of nonparametric bandit\nalgorithms under weak assumptions (Dirichlet sampling). In addition to the\ntheoretical guarantees, these results are supported by in-depth empirical\nevidence. Finally, as a first step towards personalised postoperative follow-up\nrecommendations, we developed with medical doctors and surgeons an\ninterpretable machine learning model to predict the long-term weight\ntrajectories of patients after bariatric surgery.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up. Stochastic bandits (multiarmed, contextual) model the learning of a sequence of actions (policy) by an agent in an uncertain environment in order to maximise observed rewards. To learn optimal policies, bandit algorithms have to balance the exploitation of current knowledge and the exploration of uncertain actions. Such algorithms have largely been studied and deployed in industrial applications with large datasets, low-risk decisions and clear modelling assumptions, such as clickthrough rate maximisation in online advertising. By contrast, digital health recommendations call for a whole new paradigm of small samples, risk-averse agents and complex, nonparametric modelling. To this end, we developed new safe, anytime-valid concentration bounds, (Bregman, empirical Chernoff), introduced a new framework for risk-aware contextual bandits (with elicitable risk measures) and analysed a novel class of nonparametric bandit algorithms under weak assumptions (Dirichlet sampling). In addition to the theoretical guarantees, these results are supported by in-depth empirical evidence. Finally, as a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.
统计顺序决策数学:随机匪帮中的集中、风险意识和建模,并应用于减肥手术
本论文旨在研究术后患者随访的统计顺序决策算法分析中出现的一些数学难题。随机匪徒(多臂、情境)是一个代理在不确定的环境中学习一连串行动(策略)的模型,目的是使观察到的回报最大化。为了学习最优策略,匪帮算法必须在利用当前知识和探索不确定行动之间取得平衡。这类算法主要在具有大型数据集、低风险决策和明确建模假设的工业应用中进行研究和部署,例如在线广告中的点击率最大化。相比之下,数字健康建议需要一种全新的模式,即小样本、规避风险的代理和复杂的非参数建模。为此,我们开发了新的安全、随时有效的浓度边界(Bregman、经验切尔诺夫),引入了风险感知情境匪帮的新框架(具有可复制的风险度量),并分析了弱假设下的一类新型非参数匪帮算法(狄利克特采样)。除了理论保证外,这些结果还得到了深入的经验证据的支持。最后,作为个性化术后随访建议的第一步,我们与医生和外科医生共同开发了一个可解释的机器学习模型,用于预测减肥手术后患者的长期体重轨迹。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信