通过大型语言模型推进PRRT患者教育:挑战和潜力。

IF 1.8 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
American journal of nuclear medicine and molecular imaging Pub Date : 2025-08-15 eCollection Date: 2025-01-01 DOI:10.62347/OAHP6281
Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin
{"title":"通过大型语言模型推进PRRT患者教育:挑战和潜力。","authors":"Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin","doi":"10.62347/OAHP6281","DOIUrl":null,"url":null,"abstract":"<p><p>The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; <i>P</i> = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; <i>P</i> = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; <b><i>P</i> = 0.0013</b>). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (<i>P</i> = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.</p>","PeriodicalId":7572,"journal":{"name":"American journal of nuclear medicine and molecular imaging","volume":"15 4","pages":"146-152"},"PeriodicalIF":1.8000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444397/pdf/","citationCount":"0","resultStr":"{\"title\":\"Advancing patient education in PRRT through large language models: challenges and potential.\",\"authors\":\"Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin\",\"doi\":\"10.62347/OAHP6281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; <i>P</i> = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; <i>P</i> = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; <b><i>P</i> = 0.0013</b>). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (<i>P</i> = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.</p>\",\"PeriodicalId\":7572,\"journal\":{\"name\":\"American journal of nuclear medicine and molecular imaging\",\"volume\":\"15 4\",\"pages\":\"146-152\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444397/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of nuclear medicine and molecular imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.62347/OAHP6281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of nuclear medicine and molecular imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62347/OAHP6281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)聊天机器人越来越多地用于患者教育,这引发了人们对其在传递医疗信息时的准确性、可读性和简洁性的质疑。本研究评估ChatGPT 40和DeepSeek V3在回答关于肽受体放射性核素治疗(PRRT)的常见患者询问方面的表现。将12个关于PRRT的常见问题提交给两个聊天机器人。这些回答由9位专业人士采用盲法调查进行评估,以5分制对准确性、简洁性和可读性进行评分。统计分析包括非参数数据的Mann-Whitney U检验和医学错误回答的卡方检验。共进行了324次个别评估。ChatGPT 40(平均4.43)与DeepSeek V3(平均4.56,P = 0.0909)的准确率无显著差异,ChatGPT 40(平均4.38)与DeepSeek V3(平均4.25,P = 0.1236)的可读性无显著差异。然而,与DeepSeek V3(平均4.24;P = 0.0013)相比,ChatGPT 40提供了明显更简洁的回答(平均4.55)。7-8%的聊天机器人应答中存在精确度≤3的医学错误信息,两种模型之间无显著差异(P = 0.8005)。两个人工智能聊天机器人在PRRT上提供医疗信息方面表现出色,其中ChatGPT 40在简洁性方面表现出色。然而,在使用人工智能聊天机器人进行患者教育时,医疗不准确的存在凸显了医生监督的必要性。未来的研究应探索提高临床沟通中人工智能的可靠性和个性化的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Advancing patient education in PRRT through large language models: challenges and potential.

The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; P = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; P = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; P = 0.0013). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (P = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
American journal of nuclear medicine and molecular imaging
American journal of nuclear medicine and molecular imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
自引率
4.00%
发文量
4
期刊介绍: The scope of AJNMMI encompasses all areas of molecular imaging, including but not limited to: positron emission tomography (PET), single-photon emission computed tomography (SPECT), molecular magnetic resonance imaging, magnetic resonance spectroscopy, optical bioluminescence, optical fluorescence, targeted ultrasound, photoacoustic imaging, etc. AJNMMI welcomes original and review articles on both clinical investigation and preclinical research. Occasionally, special topic issues, short communications, editorials, and invited perspectives will also be published. Manuscripts, including figures and tables, must be original and not under consideration by another journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信