{"title":"基于经验回放的前列腺癌调强放疗自动治疗计划的演员评论家。","authors":"Md Mainul Abrar, Parvat Sapkota, Damon Sprouts, Xun Jia, Yujie Chi","doi":"10.1002/mp.17915","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Achieving highly efficient treatment planning in intensity-modulated radiotherapy (IMRT) is challenging due to the complex interactions between radiation beams and the human body. The introduction of artificial intelligence (AI) has automated treatment planning, significantly improving efficiency. However, existing automatic treatment planning agents often rely on supervised or unsupervised AI models that require large datasets of high-quality patient data for training. Additionally, these networks are generally not universally applicable across patient cases from different institutions and can be vulnerable to adversarial attacks. Deep reinforcement learning (DRL), which mimics the trial-and-error process used by human planners, offers a promising new approach to address these challenges. </p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This work aims to develop a stochastic policy-based DRL agent for automatic treatment planning that facilitates effective training with limited datasets, universal applicability across diverse patient datasets, and robust performance under adversarial attacks.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We employ an actor–critic with experience replay (ACER) architecture to develop the automatic treatment planning agent. This agent operates the treatment planning system (TPS) for inverse treatment planning by automatically tuning treatment planning parameters (TPPs). We use prostate cancer IMRT patient cases as our testbed, which includes one target and two organs at risk (OARs), along with 18 discrete TPP tuning actions. The network takes dose–volume histograms (DVHs) as input and outputs a policy for effective TPP tuning, accompanied by an evaluation function for that policy. Training utilizes DVHs from treatment plans generated by an in-house TPS under randomized TPPs for a single patient case, with validation conducted on two other independent cases. Both online asynchronous learning and offline, sample-efficient experience replay methods are employed to update the network parameters. After training, six groups, comprising more than 300 initial treatment plans drawn from three datasets, were used for testing. These groups have beam and anatomical configurations distinct from those of the training case. The ProKnow scoring system for prostate cancer IMRT, with a maximum score of 9, is used to evaluate plan quality. The robustness of the network is further assessed through adversarial attacks using the fast gradient sign method (FGSM).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Despite being trained on treatment plans from a single patient case, the network converges efficiently when validated on two independent cases. For testing performance, the mean <span></span><math>\n <semantics>\n <mo>±</mo>\n <annotation>$\\pm$</annotation>\n </semantics></math> standard deviation of the plan scores across all test cases before ACER-based treatment planning is <span></span><math>\n <semantics>\n <mrow>\n <mn>6.17</mn>\n <mo>±</mo>\n <mn>1.90</mn>\n </mrow>\n <annotation>$6.17 \\pm 1.90$</annotation>\n </semantics></math>. After implementing ACER-based treatment planning, <span></span><math>\n <semantics>\n <mrow>\n <mn>92.29</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$92.29\\%$</annotation>\n </semantics></math> of the cases achieve a perfect score of 9, with only <span></span><math>\n <semantics>\n <mrow>\n <mn>6.65</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$6.65\\%$</annotation>\n </semantics></math> scoring between 8 and 9, and no cases being below 7. The corresponding mean <span></span><math>\n <semantics>\n <mo>±</mo>\n <annotation>$\\pm$</annotation>\n </semantics></math> standard deviation is <span></span><math>\n <semantics>\n <mrow>\n <mn>8.92</mn>\n <mo>±</mo>\n <mn>0.29</mn>\n </mrow>\n <annotation>$8.92 \\pm 0.29$</annotation>\n </semantics></math>. This performance highlights the ACER agent's high generality across patient data from various sources. Further analysis indicates that the ACER agent effectively prioritizes leading reasonable TPP tuning actions over obviously unsuitable ones by several orders of magnitude, showing its efficacy. Additionally, results from FGSM attacks demonstrate that the ACER-based agent remains comparatively robust against various levels of perturbation.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>We successfully trained a DRL agent using the ACER technique for high-quality treatment planning in prostate cancer IMRT. It achieves high generality across diverse patient datasets and exhibits high robustness against adversarial attacks.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 7","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mp.17915","citationCount":"0","resultStr":"{\"title\":\"Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy\",\"authors\":\"Md Mainul Abrar, Parvat Sapkota, Damon Sprouts, Xun Jia, Yujie Chi\",\"doi\":\"10.1002/mp.17915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Achieving highly efficient treatment planning in intensity-modulated radiotherapy (IMRT) is challenging due to the complex interactions between radiation beams and the human body. The introduction of artificial intelligence (AI) has automated treatment planning, significantly improving efficiency. However, existing automatic treatment planning agents often rely on supervised or unsupervised AI models that require large datasets of high-quality patient data for training. Additionally, these networks are generally not universally applicable across patient cases from different institutions and can be vulnerable to adversarial attacks. Deep reinforcement learning (DRL), which mimics the trial-and-error process used by human planners, offers a promising new approach to address these challenges. </p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>This work aims to develop a stochastic policy-based DRL agent for automatic treatment planning that facilitates effective training with limited datasets, universal applicability across diverse patient datasets, and robust performance under adversarial attacks.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We employ an actor–critic with experience replay (ACER) architecture to develop the automatic treatment planning agent. This agent operates the treatment planning system (TPS) for inverse treatment planning by automatically tuning treatment planning parameters (TPPs). We use prostate cancer IMRT patient cases as our testbed, which includes one target and two organs at risk (OARs), along with 18 discrete TPP tuning actions. The network takes dose–volume histograms (DVHs) as input and outputs a policy for effective TPP tuning, accompanied by an evaluation function for that policy. Training utilizes DVHs from treatment plans generated by an in-house TPS under randomized TPPs for a single patient case, with validation conducted on two other independent cases. Both online asynchronous learning and offline, sample-efficient experience replay methods are employed to update the network parameters. After training, six groups, comprising more than 300 initial treatment plans drawn from three datasets, were used for testing. These groups have beam and anatomical configurations distinct from those of the training case. The ProKnow scoring system for prostate cancer IMRT, with a maximum score of 9, is used to evaluate plan quality. The robustness of the network is further assessed through adversarial attacks using the fast gradient sign method (FGSM).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Despite being trained on treatment plans from a single patient case, the network converges efficiently when validated on two independent cases. For testing performance, the mean <span></span><math>\\n <semantics>\\n <mo>±</mo>\\n <annotation>$\\\\pm$</annotation>\\n </semantics></math> standard deviation of the plan scores across all test cases before ACER-based treatment planning is <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>6.17</mn>\\n <mo>±</mo>\\n <mn>1.90</mn>\\n </mrow>\\n <annotation>$6.17 \\\\pm 1.90$</annotation>\\n </semantics></math>. After implementing ACER-based treatment planning, <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>92.29</mn>\\n <mo>%</mo>\\n </mrow>\\n <annotation>$92.29\\\\%$</annotation>\\n </semantics></math> of the cases achieve a perfect score of 9, with only <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>6.65</mn>\\n <mo>%</mo>\\n </mrow>\\n <annotation>$6.65\\\\%$</annotation>\\n </semantics></math> scoring between 8 and 9, and no cases being below 7. The corresponding mean <span></span><math>\\n <semantics>\\n <mo>±</mo>\\n <annotation>$\\\\pm$</annotation>\\n </semantics></math> standard deviation is <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>8.92</mn>\\n <mo>±</mo>\\n <mn>0.29</mn>\\n </mrow>\\n <annotation>$8.92 \\\\pm 0.29$</annotation>\\n </semantics></math>. This performance highlights the ACER agent's high generality across patient data from various sources. Further analysis indicates that the ACER agent effectively prioritizes leading reasonable TPP tuning actions over obviously unsuitable ones by several orders of magnitude, showing its efficacy. Additionally, results from FGSM attacks demonstrate that the ACER-based agent remains comparatively robust against various levels of perturbation.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>We successfully trained a DRL agent using the ACER technique for high-quality treatment planning in prostate cancer IMRT. It achieves high generality across diverse patient datasets and exhibits high robustness against adversarial attacks.</p>\\n </section>\\n </div>\",\"PeriodicalId\":18384,\"journal\":{\"name\":\"Medical physics\",\"volume\":\"52 7\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mp.17915\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/mp.17915\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17915","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy
Background
Achieving highly efficient treatment planning in intensity-modulated radiotherapy (IMRT) is challenging due to the complex interactions between radiation beams and the human body. The introduction of artificial intelligence (AI) has automated treatment planning, significantly improving efficiency. However, existing automatic treatment planning agents often rely on supervised or unsupervised AI models that require large datasets of high-quality patient data for training. Additionally, these networks are generally not universally applicable across patient cases from different institutions and can be vulnerable to adversarial attacks. Deep reinforcement learning (DRL), which mimics the trial-and-error process used by human planners, offers a promising new approach to address these challenges.
Purpose
This work aims to develop a stochastic policy-based DRL agent for automatic treatment planning that facilitates effective training with limited datasets, universal applicability across diverse patient datasets, and robust performance under adversarial attacks.
Methods
We employ an actor–critic with experience replay (ACER) architecture to develop the automatic treatment planning agent. This agent operates the treatment planning system (TPS) for inverse treatment planning by automatically tuning treatment planning parameters (TPPs). We use prostate cancer IMRT patient cases as our testbed, which includes one target and two organs at risk (OARs), along with 18 discrete TPP tuning actions. The network takes dose–volume histograms (DVHs) as input and outputs a policy for effective TPP tuning, accompanied by an evaluation function for that policy. Training utilizes DVHs from treatment plans generated by an in-house TPS under randomized TPPs for a single patient case, with validation conducted on two other independent cases. Both online asynchronous learning and offline, sample-efficient experience replay methods are employed to update the network parameters. After training, six groups, comprising more than 300 initial treatment plans drawn from three datasets, were used for testing. These groups have beam and anatomical configurations distinct from those of the training case. The ProKnow scoring system for prostate cancer IMRT, with a maximum score of 9, is used to evaluate plan quality. The robustness of the network is further assessed through adversarial attacks using the fast gradient sign method (FGSM).
Results
Despite being trained on treatment plans from a single patient case, the network converges efficiently when validated on two independent cases. For testing performance, the mean standard deviation of the plan scores across all test cases before ACER-based treatment planning is . After implementing ACER-based treatment planning, of the cases achieve a perfect score of 9, with only scoring between 8 and 9, and no cases being below 7. The corresponding mean standard deviation is . This performance highlights the ACER agent's high generality across patient data from various sources. Further analysis indicates that the ACER agent effectively prioritizes leading reasonable TPP tuning actions over obviously unsuitable ones by several orders of magnitude, showing its efficacy. Additionally, results from FGSM attacks demonstrate that the ACER-based agent remains comparatively robust against various levels of perturbation.
Conclusions
We successfully trained a DRL agent using the ACER technique for high-quality treatment planning in prostate cancer IMRT. It achieves high generality across diverse patient datasets and exhibits high robustness against adversarial attacks.
期刊介绍:
Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments
Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.