Benchmarking a Foundation Large Language Model on its Ability to Relabel Structure Names in Accordance With the American Association of Physicists in Medicine Task Group-263 Report

IF 3.4 3区医学 Q2 ONCOLOGY

Practical Radiation Oncology Pub Date : 2024-11-01 DOI:10.1016/j.prro.2024.04.017

Jason Holmes PhD , Lian Zhang PhD , Yuzhen Ding PhD , Hongying Feng PhD , Zhengliang Liu MS , Tianming Liu PhD , William W. Wong MD , Sujay A. Vora MD , Jonathan B. Ashman MD, PhD , Wei Liu PhD

{"title":"Benchmarking a Foundation Large Language Model on its Ability to Relabel Structure Names in Accordance With the American Association of Physicists in Medicine Task Group-263 Report","authors":"Jason Holmes PhD , Lian Zhang PhD , Yuzhen Ding PhD , Hongying Feng PhD , Zhengliang Liu MS , Tianming Liu PhD , William W. Wong MD , Sujay A. Vora MD , Jonathan B. Ashman MD, PhD , Wei Liu PhD","doi":"10.1016/j.prro.2024.04.017","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To introduce the concept of using large language models (LLMs) to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 standard and to establish a benchmark for future studies to reference.</div></div><div><h3>Methods and Materials</h3><div>Generative Pretrained Transformer (GPT)-4 was implemented within a Digital Imaging and Communications in Medicine server. Upon receiving a structure-set Digital Imaging and Communications in Medicine file, the server prompts GPT-4 to relabel the structure names according to the American Association of Physicists in Medicine Task Group-263 report. The results were evaluated for 3 disease sites: prostate, head and neck, and thorax. For each disease site, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50), and 50 patients were randomly selected for evaluation. Structure names considered were those that were most likely to be relevant for studies using structure contours for many patients.</div></div><div><h3>Results</h3><div>The per-patient accuracy was 97.2%, 98.3%, and 97.1% for prostate, head and neck, and thorax disease sites, respectively. On a per-structure basis, the clinical target volume was relabeled correctly in 100%, 95.3%, and 92.9% of cases, respectively.</div></div><div><h3>Conclusions</h3><div>Given the accuracy of GPT-4 in relabeling structure names as presented in this work, LLMs are poised to become an important method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.</div></div>","PeriodicalId":54245,"journal":{"name":"Practical Radiation Oncology","volume":"14 6","pages":"Pages e515-e521"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Radiation Oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1879850024000985","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

To introduce the concept of using large language models (LLMs) to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 standard and to establish a benchmark for future studies to reference.

Methods and Materials

Generative Pretrained Transformer (GPT)-4 was implemented within a Digital Imaging and Communications in Medicine server. Upon receiving a structure-set Digital Imaging and Communications in Medicine file, the server prompts GPT-4 to relabel the structure names according to the American Association of Physicists in Medicine Task Group-263 report. The results were evaluated for 3 disease sites: prostate, head and neck, and thorax. For each disease site, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50), and 50 patients were randomly selected for evaluation. Structure names considered were those that were most likely to be relevant for studies using structure contours for many patients.

Results

The per-patient accuracy was 97.2%, 98.3%, and 97.1% for prostate, head and neck, and thorax disease sites, respectively. On a per-structure basis, the clinical target volume was relabeled correctly in 100%, 95.3%, and 92.9% of cases, respectively.

Conclusions

Given the accuracy of GPT-4 in relabeling structure names as presented in this work, LLMs are poised to become an important method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.

查看原文本刊更多论文

根据美国医学物理学家协会工作组-263 报告，对基础大型语言模型重新标注结构名称的能力进行基准测试。

目的：介绍使用大型语言模型（LLM）按照美国物理学家协会医学工作组-263 标准重新标注结构名称的概念，并为今后的研究建立一个参考基准：生成式预训练变换器（GPT）-4 在医学数字成像与通信服务器中实施。服务器接收到结构集数字成像与医学通信文件后，会提示 GPT-4 根据美国物理学家协会医学工作组-263 报告重新标注结构名称。评估结果针对 3 个疾病部位：前列腺、头颈部和胸部。针对每个疾病部位，随机抽取 150 名患者手动调整指示提示（每批 50 人），并随机抽取 50 名患者进行评估。所考虑的结构名称是那些最有可能与对许多患者使用结构轮廓进行研究相关的名称：前列腺、头颈部和胸部疾病部位的每位患者准确率分别为 97.2%、98.3% 和 97.1%。就每个结构而言，分别有 100%、95.3% 和 92.9% 的病例正确地重新标记了临床靶体积：鉴于 GPT-4 在重新标注结构名称方面的准确性，LLM 将成为放射肿瘤学中标准化结构名称的重要方法，特别是考虑到 LLM 功能可能会继续快速发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Practical Radiation Oncology Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

5.20

自引率

6.10%

发文量

177

审稿时长

34 days

期刊介绍： The overarching mission of Practical Radiation Oncology is to improve the quality of radiation oncology practice. PRO''s purpose is to document the state of current practice, providing background for those in training and continuing education for practitioners, through discussion and illustration of new techniques, evaluation of current practices, and publication of case reports. PRO strives to provide its readers content that emphasizes knowledge "with a purpose." The content of PRO includes: Original articles focusing on patient safety, quality measurement, or quality improvement initiatives Original articles focusing on imaging, contouring, target delineation, simulation, treatment planning, immobilization, organ motion, and other practical issues ASTRO guidelines, position papers, and consensus statements Essays that highlight enriching personal experiences in caring for cancer patients and their families.