Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy

arXiv - PHYS - Medical Physics Pub Date : 2024-07-10 DOI:arxiv-2407.07296

Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai

{"title":"Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy","authors":"Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai","doi":"arxiv-2407.07296","DOIUrl":null,"url":null,"abstract":"Radiation therapy (RT) is one of the most effective treatments for cancer,\nand its success relies on the accurate delineation of targets. However, target\ndelineation is a comprehensive medical decision that currently relies purely on\nmanual processes by human experts. Manual delineation is time-consuming,\nlaborious, and subject to interobserver variations. Although the advancements\nin artificial intelligence (AI) techniques have significantly enhanced the\nauto-contouring of normal tissues, accurate delineation of RT target volumes\nremains a challenge. In this study, we propose a visual language model-based RT\ntarget volume auto-delineation network termed Radformer. The Radformer utilizes\na hierarichal vision transformer as the backbone and incorporates large\nlanguage models to extract text-rich features from clinical data. We introduce\na visual language attention module (VLAM) for integrating visual and linguistic\nfeatures for language-aware visual encoding (LAVE). The Radformer has been\nevaluated on a dataset comprising 2985 patients with head-and-neck cancer who\nunderwent RT. Metrics, including the Dice similarity coefficient (DSC),\nintersection over union (IOU), and 95th percentile Hausdorff distance (HD95),\nwere used to evaluate the performance of the model quantitatively. Our results\ndemonstrate that the Radformer has superior segmentation performance compared\nto other state-of-the-art models, validating its potential for adoption in RT\npractice.","PeriodicalId":501378,"journal":{"name":"arXiv - PHYS - Medical Physics","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Medical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.07296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial intelligence (AI) techniques have significantly enhanced the auto-contouring of normal tissues, accurate delineation of RT target volumes remains a challenge. In this study, we propose a visual language model-based RT target volume auto-delineation network termed Radformer. The Radformer utilizes a hierarichal vision transformer as the backbone and incorporates large language models to extract text-rich features from clinical data. We introduce a visual language attention module (VLAM) for integrating visual and linguistic features for language-aware visual encoding (LAVE). The Radformer has been evaluated on a dataset comprising 2985 patients with head-and-neck cancer who underwent RT. Metrics, including the Dice similarity coefficient (DSC), intersection over union (IOU), and 95th percentile Hausdorff distance (HD95), were used to evaluate the performance of the model quantitatively. Our results demonstrate that the Radformer has superior segmentation performance compared to other state-of-the-art models, validating its potential for adoption in RT practice.

查看原文本刊更多论文

大语言模型增强的放射治疗靶区自动划线技术

放射治疗（RT）是治疗癌症最有效的方法之一，其成功与否取决于靶点的精确划分。然而，靶区划分是一项综合性医疗决策，目前完全依赖于人类专家的手动操作。人工划线费时、费力，而且受观察者之间差异的影响。虽然人工智能（AI）技术的进步大大提高了正常组织的自动轮廓绘制能力，但 RT 靶区体积的精确划分仍然是一项挑战。在这项研究中，我们提出了一种基于视觉语言模型的 RT 靶体积自动划线网络，称为 Radformer。Radformer 以分层视觉转换器为骨干，结合大型语言模型，从临床数据中提取丰富的文本特征。我们引入了视觉语言注意模块（VLAM），用于整合视觉和语言特征，实现语言感知视觉编码（LAVE）。Radformer 在由 2985 名接受 RT 治疗的头颈癌患者组成的数据集上进行了评估。包括戴斯相似性系数（DSC）、交集大于联合（IOU）和第 95 百分位数豪斯多夫距离（HD95）在内的指标被用来定量评估模型的性能。我们的结果表明，与其他最先进的模型相比，Radformer 具有更优越的分割性能，验证了其在 RT 实践中的应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - PHYS - Medical Physics

自引率

0.00%

发文量