Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin
{"title":"Advancing patient education in PRRT through large language models: challenges and potential.","authors":"Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin","doi":"10.62347/OAHP6281","DOIUrl":null,"url":null,"abstract":"<p><p>The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; <i>P</i> = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; <i>P</i> = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; <b><i>P</i> = 0.0013</b>). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (<i>P</i> = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.</p>","PeriodicalId":7572,"journal":{"name":"American journal of nuclear medicine and molecular imaging","volume":"15 4","pages":"146-152"},"PeriodicalIF":1.8000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444397/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of nuclear medicine and molecular imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62347/OAHP6281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; P = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; P = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; P = 0.0013). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (P = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.
期刊介绍:
The scope of AJNMMI encompasses all areas of molecular imaging, including but not limited to: positron emission tomography (PET), single-photon emission computed tomography (SPECT), molecular magnetic resonance imaging, magnetic resonance spectroscopy, optical bioluminescence, optical fluorescence, targeted ultrasound, photoacoustic imaging, etc. AJNMMI welcomes original and review articles on both clinical investigation and preclinical research. Occasionally, special topic issues, short communications, editorials, and invited perspectives will also be published. Manuscripts, including figures and tables, must be original and not under consideration by another journal.