Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning

Qingqing Wang, Chang Chang
{"title":"Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning","authors":"Qingqing Wang, Chang Chang","doi":"arxiv-2409.11576","DOIUrl":null,"url":null,"abstract":"Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N)\ncancers is a time-consuming and experience-demanding task where a large number\nof planning objectives are involved. Deep reinforcement learning (DRL) has\nrecently been introduced to the planning processes of intensity-modulated\nradiation therapy and brachytherapy for prostate, lung, and cervical cancers.\nHowever, existing approaches are built upon the Q-learning framework and\nweighted linear combinations of clinical metrics, suffering from poor\nscalability and flexibility and only capable of adjusting a limited number of\nplanning objectives in discrete action spaces. We propose an automatic\ntreatment planning model using the proximal policy optimization (PPO) algorithm\nand a dose distribution-based reward function for proton PBS treatment planning\nof H&N cancers. Specifically, a set of empirical rules is used to create\nauxiliary planning structures from target volumes and organs-at-risk (OARs),\nalong with their associated planning objectives. These planning objectives are\nfed into an in-house optimization engine to generate the spot monitor unit (MU)\nvalues. A decision-making policy network trained using PPO is developed to\niteratively adjust the involved planning objective parameters in a continuous\naction space and refine the PBS treatment plans using a novel dose\ndistribution-based reward function. Proton H&N treatment plans generated by the\nmodel show improved OAR sparing with equal or superior target coverage when\ncompared with human-generated plans. Moreover, additional experiments on liver\ncancer demonstrate that the proposed method can be successfully generalized to\nother treatment sites. To the best of our knowledge, this is the first\nDRL-based automatic treatment planning model capable of achieving human-level\nperformance for H&N cancers.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers. Specifically, a set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space and refine the PBS treatment plans using a novel dose distribution-based reward function. Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.
利用基于策略梯度的深度强化学习实现头颈部癌症的质子 PBS 治疗规划自动化
头颈部(H&N)癌症的质子铅笔束扫描(PBS)治疗规划是一项耗时长、经验要求高的任务,其中涉及大量规划目标。深度强化学习(DRL)最近已被引入前列腺癌、肺癌和宫颈癌的强度调控放射治疗和近距离放射治疗的规划过程中。然而,现有的方法都是建立在Q-learning框架和临床指标的加权线性组合基础上的,可扩展性和灵活性较差,只能在离散的行动空间中调整数量有限的规划目标。我们提出了一种使用近端策略优化(PPO)算法和基于剂量分布的奖励函数的自动治疗计划模型,用于 H&N 癌症的质子 PBS 治疗计划。具体来说,一套经验规则被用来从靶体积和危险器官(OAR)中创建辅助规划结构,以及与之相关的规划目标。这些规划目标被输入内部优化引擎,以生成定点监测单位(MU)值。使用 PPO 训练的决策策略网络被开发出来,用于在连续行动空间中迭代调整相关的规划目标参数,并使用基于剂量分布的新型奖励函数完善 PBS 治疗计划。与人类生成的计划相比,该模型生成的质子 H&N 治疗计划显示出更好的 OAR 疏导效果,以及相同或更优的靶点覆盖率。此外,对肝癌的其他实验证明,所提出的方法可以成功推广到其他治疗部位。据我们所知,这是第一个基于 DRL 的自动治疗计划模型,能够在 H&N 癌症方面达到人类水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信