Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning

arXiv - QuanBio - Quantitative Methods Pub Date : 2024-09-17 DOI:arxiv-2409.11576

Qingqing Wang, Chang Chang

{"title":"Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning","authors":"Qingqing Wang, Chang Chang","doi":"arxiv-2409.11576","DOIUrl":null,"url":null,"abstract":"Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N)\ncancers is a time-consuming and experience-demanding task where a large number\nof planning objectives are involved. Deep reinforcement learning (DRL) has\nrecently been introduced to the planning processes of intensity-modulated\nradiation therapy and brachytherapy for prostate, lung, and cervical cancers.\nHowever, existing approaches are built upon the Q-learning framework and\nweighted linear combinations of clinical metrics, suffering from poor\nscalability and flexibility and only capable of adjusting a limited number of\nplanning objectives in discrete action spaces. We propose an automatic\ntreatment planning model using the proximal policy optimization (PPO) algorithm\nand a dose distribution-based reward function for proton PBS treatment planning\nof H&N cancers. Specifically, a set of empirical rules is used to create\nauxiliary planning structures from target volumes and organs-at-risk (OARs),\nalong with their associated planning objectives. These planning objectives are\nfed into an in-house optimization engine to generate the spot monitor unit (MU)\nvalues. A decision-making policy network trained using PPO is developed to\niteratively adjust the involved planning objective parameters in a continuous\naction space and refine the PBS treatment plans using a novel dose\ndistribution-based reward function. Proton H&N treatment plans generated by the\nmodel show improved OAR sparing with equal or superior target coverage when\ncompared with human-generated plans. Moreover, additional experiments on liver\ncancer demonstrate that the proposed method can be successfully generalized to\nother treatment sites. To the best of our knowledge, this is the first\nDRL-based automatic treatment planning model capable of achieving human-level\nperformance for H&N cancers.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers. Specifically, a set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space and refine the PBS treatment plans using a novel dose distribution-based reward function. Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.

查看原文本刊更多论文

利用基于策略梯度的深度强化学习实现头颈部癌症的质子 PBS 治疗规划自动化

头颈部（H&N）癌症的质子铅笔束扫描（PBS）治疗规划是一项耗时长、经验要求高的任务，其中涉及大量规划目标。深度强化学习（DRL）最近已被引入前列腺癌、肺癌和宫颈癌的强度调控放射治疗和近距离放射治疗的规划过程中。然而，现有的方法都是建立在Q-learning框架和临床指标的加权线性组合基础上的，可扩展性和灵活性较差，只能在离散的行动空间中调整数量有限的规划目标。我们提出了一种使用近端策略优化（PPO）算法和基于剂量分布的奖励函数的自动治疗计划模型，用于 H&N 癌症的质子 PBS 治疗计划。具体来说，一套经验规则被用来从靶体积和危险器官（OAR）中创建辅助规划结构，以及与之相关的规划目标。这些规划目标被输入内部优化引擎，以生成定点监测单位（MU）值。使用 PPO 训练的决策策略网络被开发出来，用于在连续行动空间中迭代调整相关的规划目标参数，并使用基于剂量分布的新型奖励函数完善 PBS 治疗计划。与人类生成的计划相比，该模型生成的质子 H&N 治疗计划显示出更好的 OAR 疏导效果，以及相同或更优的靶点覆盖率。此外，对肝癌的其他实验证明，所提出的方法可以成功推广到其他治疗部位。据我们所知，这是第一个基于 DRL 的自动治疗计划模型，能够在 H&N 癌症方面达到人类水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Quantitative Methods

自引率

0.00%

发文量