Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy

IF 3.2 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Medical physics Pub Date : 2025-05-31 DOI:10.1002/mp.17915

Md Mainul Abrar, Parvat Sapkota, Damon Sprouts, Xun Jia, Yujie Chi

{"title":"Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy","authors":"Md Mainul Abrar, Parvat Sapkota, Damon Sprouts, Xun Jia, Yujie Chi","doi":"10.1002/mp.17915","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Achieving highly efficient treatment planning in intensity-modulated radiotherapy (IMRT) is challenging due to the complex interactions between radiation beams and the human body. The introduction of artificial intelligence (AI) has automated treatment planning, significantly improving efficiency. However, existing automatic treatment planning agents often rely on supervised or unsupervised AI models that require large datasets of high-quality patient data for training. Additionally, these networks are generally not universally applicable across patient cases from different institutions and can be vulnerable to adversarial attacks. Deep reinforcement learning (DRL), which mimics the trial-and-error process used by human planners, offers a promising new approach to address these challenges. </p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This work aims to develop a stochastic policy-based DRL agent for automatic treatment planning that facilitates effective training with limited datasets, universal applicability across diverse patient datasets, and robust performance under adversarial attacks.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We employ an actor–critic with experience replay (ACER) architecture to develop the automatic treatment planning agent. This agent operates the treatment planning system (TPS) for inverse treatment planning by automatically tuning treatment planning parameters (TPPs). We use prostate cancer IMRT patient cases as our testbed, which includes one target and two organs at risk (OARs), along with 18 discrete TPP tuning actions. The network takes dose–volume histograms (DVHs) as input and outputs a policy for effective TPP tuning, accompanied by an evaluation function for that policy. Training utilizes DVHs from treatment plans generated by an in-house TPS under randomized TPPs for a single patient case, with validation conducted on two other independent cases. Both online asynchronous learning and offline, sample-efficient experience replay methods are employed to update the network parameters. After training, six groups, comprising more than 300 initial treatment plans drawn from three datasets, were used for testing. These groups have beam and anatomical configurations distinct from those of the training case. The ProKnow scoring system for prostate cancer IMRT, with a maximum score of 9, is used to evaluate plan quality. The robustness of the network is further assessed through adversarial attacks using the fast gradient sign method (FGSM).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Despite being trained on treatment plans from a single patient case, the network converges efficiently when validated on two independent cases. For testing performance, the mean <span></span><math>\n <semantics>\n <mo>±</mo>\n <annotation>$\\pm$</annotation>\n </semantics></math> standard deviation of the plan scores across all test cases before ACER-based treatment planning is <span></span><math>\n <semantics>\n <mrow>\n <mn>6.17</mn>\n <mo>±</mo>\n <mn>1.90</mn>\n </mrow>\n <annotation>$6.17 \\pm 1.90$</annotation>\n </semantics></math>. After implementing ACER-based treatment planning, <span></span><math>\n <semantics>\n <mrow>\n <mn>92.29</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$92.29\\%$</annotation>\n </semantics></math> of the cases achieve a perfect score of 9, with only <span></span><math>\n <semantics>\n <mrow>\n <mn>6.65</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$6.65\\%$</annotation>\n </semantics></math> scoring between 8 and 9, and no cases being below 7. The corresponding mean <span></span><math>\n <semantics>\n <mo>±</mo>\n <annotation>$\\pm$</annotation>\n </semantics></math> standard deviation is <span></span><math>\n <semantics>\n <mrow>\n <mn>8.92</mn>\n <mo>±</mo>\n <mn>0.29</mn>\n </mrow>\n <annotation>$8.92 \\pm 0.29$</annotation>\n </semantics></math>. This performance highlights the ACER agent's high generality across patient data from various sources. Further analysis indicates that the ACER agent effectively prioritizes leading reasonable TPP tuning actions over obviously unsuitable ones by several orders of magnitude, showing its efficacy. Additionally, results from FGSM attacks demonstrate that the ACER-based agent remains comparatively robust against various levels of perturbation.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>We successfully trained a DRL agent using the ACER technique for high-quality treatment planning in prostate cancer IMRT. It achieves high generality across diverse patient datasets and exhibits high robustness against adversarial attacks.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 7","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mp.17915","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17915","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Achieving highly efficient treatment planning in intensity-modulated radiotherapy (IMRT) is challenging due to the complex interactions between radiation beams and the human body. The introduction of artificial intelligence (AI) has automated treatment planning, significantly improving efficiency. However, existing automatic treatment planning agents often rely on supervised or unsupervised AI models that require large datasets of high-quality patient data for training. Additionally, these networks are generally not universally applicable across patient cases from different institutions and can be vulnerable to adversarial attacks. Deep reinforcement learning (DRL), which mimics the trial-and-error process used by human planners, offers a promising new approach to address these challenges.

Purpose

This work aims to develop a stochastic policy-based DRL agent for automatic treatment planning that facilitates effective training with limited datasets, universal applicability across diverse patient datasets, and robust performance under adversarial attacks.

Methods

We employ an actor–critic with experience replay (ACER) architecture to develop the automatic treatment planning agent. This agent operates the treatment planning system (TPS) for inverse treatment planning by automatically tuning treatment planning parameters (TPPs). We use prostate cancer IMRT patient cases as our testbed, which includes one target and two organs at risk (OARs), along with 18 discrete TPP tuning actions. The network takes dose–volume histograms (DVHs) as input and outputs a policy for effective TPP tuning, accompanied by an evaluation function for that policy. Training utilizes DVHs from treatment plans generated by an in-house TPS under randomized TPPs for a single patient case, with validation conducted on two other independent cases. Both online asynchronous learning and offline, sample-efficient experience replay methods are employed to update the network parameters. After training, six groups, comprising more than 300 initial treatment plans drawn from three datasets, were used for testing. These groups have beam and anatomical configurations distinct from those of the training case. The ProKnow scoring system for prostate cancer IMRT, with a maximum score of 9, is used to evaluate plan quality. The robustness of the network is further assessed through adversarial attacks using the fast gradient sign method (FGSM).

Results

Despite being trained on treatment plans from a single patient case, the network converges efficiently when validated on two independent cases. For testing performance, the mean $\pm$ standard deviation of the plan scores across all test cases before ACER-based treatment planning is $6.17 \pm 1.90$ . After implementing ACER-based treatment planning, $92.29 %$ of the cases achieve a perfect score of 9, with only $6.65 %$ scoring between 8 and 9, and no cases being below 7. The corresponding mean $\pm$ standard deviation is $8.92 \pm 0.29$ . This performance highlights the ACER agent's high generality across patient data from various sources. Further analysis indicates that the ACER agent effectively prioritizes leading reasonable TPP tuning actions over obviously unsuitable ones by several orders of magnitude, showing its efficacy. Additionally, results from FGSM attacks demonstrate that the ACER-based agent remains comparatively robust against various levels of perturbation.

Conclusions

We successfully trained a DRL agent using the ACER technique for high-quality treatment planning in prostate cancer IMRT. It achieves high generality across diverse patient datasets and exhibits high robustness against adversarial attacks.

Abstract Image

查看原文本刊更多论文

基于经验回放的前列腺癌调强放疗自动治疗计划的演员评论家。

背景：由于辐射束与人体之间复杂的相互作用，在调强放疗（IMRT）中实现高效的治疗计划是具有挑战性的。人工智能（AI）的引入使治疗计划自动化，显著提高了效率。然而，现有的自动治疗计划代理通常依赖于有监督或无监督的人工智能模型，这些模型需要大量高质量的患者数据集进行训练。此外，这些网络通常不能普遍适用于来自不同机构的患者病例，并且容易受到对抗性攻击。深度强化学习（DRL）模仿人类规划者使用的试错过程，为解决这些挑战提供了一种有希望的新方法。目的：本工作旨在开发一种基于随机策略的DRL代理，用于自动治疗计划，促进有限数据集的有效训练，跨不同患者数据集的普遍适用性，以及对抗性攻击下的稳健性能。方法：采用具有经验回放（ACER）架构的行为批评家开发自动治疗计划代理。该制剂通过自动调整治疗计划参数（TPPs）来操作治疗计划系统（TPS）进行反向治疗计划。我们使用前列腺癌IMRT患者病例作为我们的试验平台，其中包括一个靶点和两个危险器官（OARs），以及18个独立的TPP调节动作。该网络将剂量-体积直方图（dvh）作为输入，并输出有效TPP调整的策略，并附带该策略的评估函数。培训使用由内部TPS根据随机TPS为单个患者病例生成的治疗计划中的dvh，并对另外两个独立病例进行验证。采用在线异步学习和离线高效采样经验重放两种方法更新网络参数。训练后，从三个数据集中抽取了6组300多个初始治疗方案，用于测试。这些组的梁和解剖结构不同于那些训练病例。ProKnow前列腺癌IMRT评分系统，最高评分为9分，用于评估计划质量。通过使用快速梯度符号方法（FGSM）的对抗性攻击进一步评估网络的鲁棒性。结果：尽管是针对单个患者病例的治疗方案进行培训，但当对两个独立病例进行验证时，该网络可以有效地收敛。对于测试性能，在基于acer的治疗计划之前，所有测试用例的计划得分的平均值±$\pm$标准差为6.17±1.90$ 6.17 \pm$ 1.90$。实施基于acer的治疗计划后，92.29%的病例达到了9分的满分，只有6.65%的病例在8 - 9分之间，没有病例低于7分。相应的平均值±$\pm$标准差为8.92±0.29$ 8.92 \pm 0.29$。这一表现突出了ACER代理在各种来源的患者数据中的高通用性。进一步分析表明，ACER代理有效地将领先的合理的TPP调整动作优先于明显不合适的动作，其效果优于几个数量级，表明其有效性。此外，FGSM攻击的结果表明，基于acer的代理对各种程度的扰动仍然相对稳健。结论：我们成功地训练了一名使用ACER技术的DRL代理人，用于前列腺癌IMRT的高质量治疗计划。它在不同的患者数据集上实现了高通用性，并对对抗性攻击表现出高鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical physics 医学-核医学

CiteScore

6.80

自引率

15.80%

发文量

660

审稿时长

1.7 months

期刊介绍： Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.