Dominik Wawrzuta, Aleksandra Napieralska, Katarzyna Ludwikowska, Laimonas Jaruševičius, Anastasija Trofimoviča-Krasnorucka, Gints Rausis, Agata Szulc, Katarzyna Pędziwiatr, Kateřina Poláchová, Justyna Klejdysz, Marzanna Chojnacka
{"title":"Large language models for pretreatment education in pediatric radiation oncology: A comparative evaluation study.","authors":"Dominik Wawrzuta, Aleksandra Napieralska, Katarzyna Ludwikowska, Laimonas Jaruševičius, Anastasija Trofimoviča-Krasnorucka, Gints Rausis, Agata Szulc, Katarzyna Pędziwiatr, Kateřina Poláchová, Justyna Klejdysz, Marzanna Chojnacka","doi":"10.1016/j.ctro.2025.100914","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and purpose: </strong>Pediatric radiotherapy patients and their parents are usually aware of their need for radiotherapy early on, but they meet with a radiation oncologist later in their treatment. Consequently, they search for information online, often encountering unreliable sources. Large language models (LLMs) have the potential to serve as an educational pretreatment tool, providing reliable answers to their questions. We aimed to evaluate the responses provided by generative pre-trained transformers (GPT), the most popular subgroup of LLMs, to questions about pediatric radiation oncology.</p><p><strong>Materials and methods: </strong>We collected pretreatment questions regarding radiotherapy from patients and parents. Responses were generated using GPT-3.5, GPT-4, and fine-tuned GPT-3.5, with fine-tuning based on pediatric radiotherapy guides from various institutions. Additionally, a radiation oncologist prepared answers to these questions. Finally, a multi-institutional group of nine pediatric radiotherapy experts conducted a blind review of responses, assessing reliability, concision, and comprehensibility.</p><p><strong>Results: </strong>The radiation oncologist and GPT-4 provided the highest-quality responses, though GPT-4's answers were often excessively verbose. While fine-tuned GPT-3.5 generally outperformed basic GPT-3.5, it often provided overly simplistic answers. Inadequate responses were rare, occurring in 4% of GPT-generated responses across all models, primarily due to GPT-3.5 generating excessively long responses.</p><p><strong>Conclusions: </strong>LLMs can be valuable tools for educating patients and their families before treatment in pediatric radiation oncology. Among them, only GPT-4 provides information of a quality comparable to that of a radiation oncologist, although it still occasionally generates poor-quality responses. GPT-3.5 models should be used cautiously, as they are more likely to produce inadequate answers to patient questions.</p>","PeriodicalId":10342,"journal":{"name":"Clinical and Translational Radiation Oncology","volume":"51 ","pages":"100914"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11762905/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Radiation Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ctro.2025.100914","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background and purpose: Pediatric radiotherapy patients and their parents are usually aware of their need for radiotherapy early on, but they meet with a radiation oncologist later in their treatment. Consequently, they search for information online, often encountering unreliable sources. Large language models (LLMs) have the potential to serve as an educational pretreatment tool, providing reliable answers to their questions. We aimed to evaluate the responses provided by generative pre-trained transformers (GPT), the most popular subgroup of LLMs, to questions about pediatric radiation oncology.
Materials and methods: We collected pretreatment questions regarding radiotherapy from patients and parents. Responses were generated using GPT-3.5, GPT-4, and fine-tuned GPT-3.5, with fine-tuning based on pediatric radiotherapy guides from various institutions. Additionally, a radiation oncologist prepared answers to these questions. Finally, a multi-institutional group of nine pediatric radiotherapy experts conducted a blind review of responses, assessing reliability, concision, and comprehensibility.
Results: The radiation oncologist and GPT-4 provided the highest-quality responses, though GPT-4's answers were often excessively verbose. While fine-tuned GPT-3.5 generally outperformed basic GPT-3.5, it often provided overly simplistic answers. Inadequate responses were rare, occurring in 4% of GPT-generated responses across all models, primarily due to GPT-3.5 generating excessively long responses.
Conclusions: LLMs can be valuable tools for educating patients and their families before treatment in pediatric radiation oncology. Among them, only GPT-4 provides information of a quality comparable to that of a radiation oncologist, although it still occasionally generates poor-quality responses. GPT-3.5 models should be used cautiously, as they are more likely to produce inadequate answers to patient questions.