Reinforcement learning with mechanistic models to optimise radiotherapy and immunotherapy combinations: a proof of concept.

IF 3.4 3区医学 Q2 ENGINEERING, BIOMEDICAL

Physics in medicine and biology Pub Date : 2025-09-25 DOI:10.1088/1361-6560/ae0863

Allison M Ng, Du Q Huynh, Rebecca A D'Alonzo, Synat Keam, Pejman Rowshanfarzad, Anna K Nowak, Suki Gill, Alistair M Cook, Martin A Ebert

{"title":"Reinforcement learning with mechanistic models to optimise radiotherapy and immunotherapy combinations: a proof of concept.","authors":"Allison M Ng, Du Q Huynh, Rebecca A D'Alonzo, Synat Keam, Pejman Rowshanfarzad, Anna K Nowak, Suki Gill, Alistair M Cook, Martin A Ebert","doi":"10.1088/1361-6560/ae0863","DOIUrl":null,"url":null,"abstract":"Objective.To investigate the use of reinforcement learning (RL) algorithms to optimise complex combination cancer therapies. The RL algorithm investigated the effect of varying the radiotherapy (RT) dose in each fraction when administered in conjunction with the immune checkpoint inhibitors (ICIs) anti-PD-1 and anti-CTLA-4.Approach.Data were available for BALB/c mice inoculated with a syngeneic mesothelioma tumour on the flank, treated with combination RT and ICI with tumour growth subsequently measured. A deepQ-network (DQN) and a double DQN were trained using a mechanistic model fitted to the mesothelioma volumes to simulate the dynamics of the tumour microenvironment. Two reward functions were created for the RL algorithm to optimise: the first only considered tumour cell killing, while the second penalised treatment schedules with higher total RT dose. Comparison with experimental results was via the tumour control probability (TCP).Main Results.All the TCPs obtained with the RL algorithm exceeded the TCPs obtained with the same mechanistic model when only 1 or 2 fractions of RT were administered. However, the baseline schedule of 2 Gy per fraction outperformed the treatment schedules generated by RL.Significance.This study highlights the potential for RL to explore the vast solution space of possible treatment schedules, conceivably at the individual patient level.","PeriodicalId":20185,"journal":{"name":"Physics in medicine and biology","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics in medicine and biology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1088/1361-6560/ae0863","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective.To investigate the use of reinforcement learning (RL) algorithms to optimise complex combination cancer therapies. The RL algorithm investigated the effect of varying the radiotherapy (RT) dose in each fraction when administered in conjunction with the immune checkpoint inhibitors (ICIs) anti-PD-1 and anti-CTLA-4.Approach.Data were available for BALB/c mice inoculated with a syngeneic mesothelioma tumour on the flank, treated with combination RT and ICI with tumour growth subsequently measured. A deepQ-network (DQN) and a double DQN were trained using a mechanistic model fitted to the mesothelioma volumes to simulate the dynamics of the tumour microenvironment. Two reward functions were created for the RL algorithm to optimise: the first only considered tumour cell killing, while the second penalised treatment schedules with higher total RT dose. Comparison with experimental results was via the tumour control probability (TCP).Main Results.All the TCPs obtained with the RL algorithm exceeded the TCPs obtained with the same mechanistic model when only 1 or 2 fractions of RT were administered. However, the baseline schedule of 2 Gy per fraction outperformed the treatment schedules generated by RL.Significance.This study highlights the potential for RL to explore the vast solution space of possible treatment schedules, conceivably at the individual patient level.

查看原文本刊更多论文

强化学习与机制模型，以优化放射治疗和免疫治疗的组合：概念的证明。

目的：研究使用强化学习（RL）算法来优化复杂的联合癌症治疗。RL算法研究了当与抗pd -1和抗ctla -4的免疫检查点抑制剂（ICIs）联合使用时，每个部分放射治疗（RT）剂量变化的影响。方法：数据可用于接种了侧腹同源间皮瘤的BALB/c小鼠，使用RT和ICI联合治疗，随后测量肿瘤生长。使用适合于间皮瘤体积的机制模型来模拟肿瘤微环境（TME）的动态，训练深度q -网络和双深度q -网络。为了优化RL算法，我们创建了两个奖励函数：第一个函数只考虑肿瘤细胞的杀伤，而第二个函数则以更高的总RT剂量惩罚治疗方案。通过肿瘤控制概率（TCP）与实验结果进行比较。结果：RL算法获得的TCP均超过了仅给予1或2份RT时采用相同机制模型获得的TCP。然而，每分数2 Gy的基线治疗方案优于RL生成的治疗方案。结论：本研究强调了RL探索可能的治疗方案的广阔解决空间的潜力，可以想象在个体患者水平上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physics in medicine and biology 医学-工程：生物医学

CiteScore

6.50

自引率

14.30%

发文量

409

审稿时长

2 months

期刊介绍： The development and application of theoretical, computational and experimental physics to medicine, physiology and biology. Topics covered are: therapy physics (including ionizing and non-ionizing radiation); biomedical imaging (e.g. x-ray, magnetic resonance, ultrasound, optical and nuclear imaging); image-guided interventions; image reconstruction and analysis (including kinetic modelling); artificial intelligence in biomedical physics and analysis; nanoparticles in imaging and therapy; radiobiology; radiation protection and patient dose monitoring; radiation dosimetry