Enhancing auto-contouring with large language model in high-dose rate brachytherapy for cervical cancers

IF 3.2 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Medical physics Pub Date : 2025-09-23 DOI:10.1002/mp.70034

Jing Wang, Jiahan Zhang, Kaida Yang, Beth Bradshaw Ghavidel, Benyamin Khajetash, Abolfazl Sarikhani, Mohammad Houshyari, Tian Liu, Yang Lei, Meysam Tavakoli

{"title":"Enhancing auto-contouring with large language model in high-dose rate brachytherapy for cervical cancers","authors":"Jing Wang, Jiahan Zhang, Kaida Yang, Beth Bradshaw Ghavidel, Benyamin Khajetash, Abolfazl Sarikhani, Mohammad Houshyari, Tian Liu, Yang Lei, Meysam Tavakoli","doi":"10.1002/mp.70034","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> BACKGROUND</h3>\n \n <p>High-dose-rate brachytherapy (HDR-BT) is a cornerstone of cervical cancer (CC) treatment, requiring the precise delineation of high-risk clinical target volumes (HR-CTV) and organs at risk (OARs) for effective dose delivery and toxicity reduction. However, the time-sensitive nature of HDR-BT planning and its reliance on expert contouring introduce inter- and intra-observer variability, posing challenges for consistent and accurate treatment planning.</p>\n </section>\n \n <section>\n \n <h3> PURPOSE</h3>\n \n <p>This study proposes a novel deep learning (DL)-based auto-segmentation framework, guided by task-specific prompts generated from large language models (LLMs), to address these challenges and improve segmentation accuracy and efficiency.</p>\n </section>\n \n <section>\n \n <h3> METHODS</h3>\n \n <p>A retrospective dataset of 32 CC patients, encompassing 124 planning computed tomography (pCT) images, was utilized. The framework integrates clinical guidelines for organ contouring from the American Brachytherapy Society (ABS), the European Society for Radiotherapy and Oncology (ESTRO), and the International Commission on Radiation Units and Measurements (ICRU). LLMs, particularly Chat-GPT, extracts domain knowledge from these contouring guidelines to generate task-specific prompts, which guide a Swin transformer-based encoder and a fully convolutional network (FCN) decoder for segmentation. The DL pipeline was evaluated on HR-CTV and OARs, including the bladder, rectum, and sigmoid. Metrics such as Dice similarity coefficient (DSC), Hausdorff distance (HD95%), mean surface distance (MSD), and center-of-mass distance (CMD) were used for performance assessment. An ablation study compared the prompt-guided approach with a baseline model without prompt guidance. Statistical differences were tested with two-tailed paired <i>t</i>-tests, and <i>p</i>-values were adjusted using the Benjamini–Hochberg method to address the multiple comparisons correction and results with adjusted <i>p</i> < 0.05 were deemed significant. Cohen's d values were calculated to quantify effect sizes.</p>\n </section>\n \n <section>\n \n <h3> RESULTS</h3>\n \n <p>The proposed framework achieved the highest segmentation for the bladder (DSC of 0.91 ± 0.07), followed by the HR-CTV (DSC of 0.80 ± 0.08) and the rectum (DSC of 0.78 ± 0.07), and a lower accuracy for sigmoid (DSC of 0.63 ± 0.15) due to its small size and irregular shape. Boundary precision was highest for the HR-CTV (HD95%: 6.32 ± 2.31 mm). The ablation study confirmed the contribution of prompt guidance, with statistically significant improvements in DSC and/or HD95% (<i>p</i> < 0.05) for all OARs. Prompt guidance, however, did not improve the accuracy of HR-CTV delineation.</p>\n </section>\n \n <section>\n \n <h3> CONCLUSIONS</h3>\n \n <p>This study demonstrates the feasibility and effectiveness of integrating LLM-generated task-specific prompts with DL-based segmentation for HDR-BT in CC. The proposed framework enhances segmentation consistency to support accurate treatment planning, addressing critical challenges in HDR-BT workflows.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 10","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.70034","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

BACKGROUND

High-dose-rate brachytherapy (HDR-BT) is a cornerstone of cervical cancer (CC) treatment, requiring the precise delineation of high-risk clinical target volumes (HR-CTV) and organs at risk (OARs) for effective dose delivery and toxicity reduction. However, the time-sensitive nature of HDR-BT planning and its reliance on expert contouring introduce inter- and intra-observer variability, posing challenges for consistent and accurate treatment planning.

PURPOSE

This study proposes a novel deep learning (DL)-based auto-segmentation framework, guided by task-specific prompts generated from large language models (LLMs), to address these challenges and improve segmentation accuracy and efficiency.

METHODS

A retrospective dataset of 32 CC patients, encompassing 124 planning computed tomography (pCT) images, was utilized. The framework integrates clinical guidelines for organ contouring from the American Brachytherapy Society (ABS), the European Society for Radiotherapy and Oncology (ESTRO), and the International Commission on Radiation Units and Measurements (ICRU). LLMs, particularly Chat-GPT, extracts domain knowledge from these contouring guidelines to generate task-specific prompts, which guide a Swin transformer-based encoder and a fully convolutional network (FCN) decoder for segmentation. The DL pipeline was evaluated on HR-CTV and OARs, including the bladder, rectum, and sigmoid. Metrics such as Dice similarity coefficient (DSC), Hausdorff distance (HD95%), mean surface distance (MSD), and center-of-mass distance (CMD) were used for performance assessment. An ablation study compared the prompt-guided approach with a baseline model without prompt guidance. Statistical differences were tested with two-tailed paired t-tests, and p-values were adjusted using the Benjamini–Hochberg method to address the multiple comparisons correction and results with adjusted p < 0.05 were deemed significant. Cohen's d values were calculated to quantify effect sizes.

RESULTS

The proposed framework achieved the highest segmentation for the bladder (DSC of 0.91 ± 0.07), followed by the HR-CTV (DSC of 0.80 ± 0.08) and the rectum (DSC of 0.78 ± 0.07), and a lower accuracy for sigmoid (DSC of 0.63 ± 0.15) due to its small size and irregular shape. Boundary precision was highest for the HR-CTV (HD95%: 6.32 ± 2.31 mm). The ablation study confirmed the contribution of prompt guidance, with statistically significant improvements in DSC and/or HD95% (p < 0.05) for all OARs. Prompt guidance, however, did not improve the accuracy of HR-CTV delineation.

CONCLUSIONS

This study demonstrates the feasibility and effectiveness of integrating LLM-generated task-specific prompts with DL-based segmentation for HDR-BT in CC. The proposed framework enhances segmentation consistency to support accurate treatment planning, addressing critical challenges in HDR-BT workflows.

Abstract Image

查看原文本刊更多论文

增强大语言模型在宫颈癌高剂量率近距离治疗中的自动轮廓。

背景：高剂量率近距离放射治疗（HDR-BT）是宫颈癌（CC）治疗的基石，需要精确描述高危临床靶体积（HR-CTV）和危险器官（OARs）以实现有效剂量传递和毒性降低。然而，HDR-BT计划的时间敏感性及其对专家轮廓的依赖引入了观察者之间和观察者内部的可变性，对一致和准确的治疗计划提出了挑战。目的：本研究提出了一种新的基于深度学习（DL）的自动分词框架，该框架由大型语言模型（llm）生成的特定任务提示引导，以解决这些挑战并提高分词的准确性和效率。方法：回顾性分析32例CC患者的数据集，包括124张规划计算机断层扫描（pCT）图像。该框架整合了来自美国近距离放射治疗学会（ABS）、欧洲放射治疗与肿瘤学会（ESTRO）和国际放射单位与测量委员会（ICRU）的器官轮廓临床指南。llm，特别是聊天- gpt，从这些轮廓指南中提取领域知识，以生成特定于任务的提示，这些提示指导基于Swin变压器的编码器和全卷积网络（FCN）解码器进行分割。在HR-CTV和OARs上评估DL管道，包括膀胱、直肠和乙状结肠。使用Dice相似系数（DSC）、Hausdorff距离（HD95%）、平均表面距离（MSD）和质心距离（CMD）等指标进行性能评估。一项消融研究将快速引导入路与无快速引导的基线模型进行了比较。统计差异与双尾配对t检验,p值调整使用Benjamini-Hochberg方法解决多个比较校正结果与调整p结果:拟议的框架实现最高的细分为膀胱(0.91±0.07)DSC,其次是HR-CTV 0.80±0.08 (DSC)和直肠(0.78±0.07)DSC,和较低的精度乙状结肠(0.63±0.15)的DSC由于其体积小,形状不规则。HR-CTV的边界精度最高（HD95%: 6.32±2.31 mm）。消融研究证实了及时指导的贡献，DSC和/或HD95%的统计显着改善(p)。结论：本研究证明了将llm生成的任务特定提示与基于dl的CC HDR-BT分割集成的可行性和有效性。提出的框架增强了分割一致性，以支持准确的治疗计划，解决HDR-BT工作流程中的关键挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical physics 医学-核医学

CiteScore

6.80

自引率

15.80%

发文量

660

审稿时长

1.7 months

期刊介绍： Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.