通过大型语言模型推进PRRT患者教育：挑战和潜力。

IF 1.8 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

American journal of nuclear medicine and molecular imaging Pub Date : 2025-08-15 eCollection Date: 2025-01-01 DOI:10.62347/OAHP6281

Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin

{"title":"通过大型语言模型推进PRRT患者教育：挑战和潜力。","authors":"Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin","doi":"10.62347/OAHP6281","DOIUrl":null,"url":null,"abstract":"The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; P = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; P = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; P = 0.0013). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (P = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.","PeriodicalId":7572,"journal":{"name":"American journal of nuclear medicine and molecular imaging","volume":"15 4","pages":"146-152"},"PeriodicalIF":1.8000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444397/pdf/","citationCount":"0","resultStr":"{\"title\":\"Advancing patient education in PRRT through large language models: challenges and potential.\",\"authors\":\"Tilman Speicher, Moritz B Bastian, Arne Blickle, Armin Atzinger, Florian Rosar, Caroline Burgard, Samer Ezziddin\",\"doi\":\"10.62347/OAHP6281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; P = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; P = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; P = 0.0013). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (P = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.\",\"PeriodicalId\":7572,\"journal\":{\"name\":\"American journal of nuclear medicine and molecular imaging\",\"volume\":\"15 4\",\"pages\":\"146-152\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444397/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of nuclear medicine and molecular imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.62347/OAHP6281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of nuclear medicine and molecular imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62347/OAHP6281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI）聊天机器人越来越多地用于患者教育，这引发了人们对其在传递医疗信息时的准确性、可读性和简洁性的质疑。本研究评估ChatGPT 40和DeepSeek V3在回答关于肽受体放射性核素治疗（PRRT）的常见患者询问方面的表现。将12个关于PRRT的常见问题提交给两个聊天机器人。这些回答由9位专业人士采用盲法调查进行评估，以5分制对准确性、简洁性和可读性进行评分。统计分析包括非参数数据的Mann-Whitney U检验和医学错误回答的卡方检验。共进行了324次个别评估。ChatGPT 40（平均4.43）与DeepSeek V3（平均4.56,P = 0.0909）的准确率无显著差异，ChatGPT 40（平均4.38）与DeepSeek V3（平均4.25,P = 0.1236）的可读性无显著差异。然而，与DeepSeek V3（平均4.24;P = 0.0013）相比，ChatGPT 40提供了明显更简洁的回答（平均4.55）。7-8%的聊天机器人应答中存在精确度≤3的医学错误信息，两种模型之间无显著差异（P = 0.8005）。两个人工智能聊天机器人在PRRT上提供医疗信息方面表现出色，其中ChatGPT 40在简洁性方面表现出色。然而，在使用人工智能聊天机器人进行患者教育时，医疗不准确的存在凸显了医生监督的必要性。未来的研究应探索提高临床沟通中人工智能的可靠性和个性化的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Advancing patient education in PRRT through large language models: challenges and potential.

The increasing use of artificial intelligence (AI) chatbots for patient education raises questions about their accuracy, readability, and conciseness in delivering medical information. This study evaluates the performance of ChatGPT 4o and DeepSeek V3 in answering common patient inquiries about Peptide Receptor Radionuclide Therapy (PRRT). Twelve frequently asked patient questions regarding PRRT were submitted to both chatbots. The responses were assessed by nine professionals using a blinded survey, scoring accuracy, conciseness, and readability on a five-point scale. Statistical analyses included the Mann-Whitney U test for nonparametric data and the Chi-square test for medically incorrect responses. A total of 324 individual assessments were conducted. No significant differences were found in accuracy between ChatGPT 4o (mean 4.43) and DeepSeek V3 (mean 4.56; P = 0.0909) or in readability between ChatGPT 4o (mean 4.38) and DeepSeek V3 (mean 4.25; P = 0.1236). However, ChatGPT 4o provided significantly more concise responses (mean 4.55) compared to DeepSeek V3 (mean 4.24; P = 0.0013). Medically incorrect information defined as accuracy ≤ 3 was present in 7-8% of chatbot responses, with no significant difference between the two models (P = 0.8005). Both AI chatbots demonstrated strong performance in providing medical information on PRRT, with ChatGPT 4o excelling in conciseness. However, the presence of medical inaccuracies highlights the need for physician oversight when using AI chatbots for patient education. Future research should explore methods to enhance AI reliability and personalization in clinical communication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American journal of nuclear medicine and molecular imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

自引率

4.00%

发文量

期刊介绍： The scope of AJNMMI encompasses all areas of molecular imaging, including but not limited to: positron emission tomography (PET), single-photon emission computed tomography (SPECT), molecular magnetic resonance imaging, magnetic resonance spectroscopy, optical bioluminescence, optical fluorescence, targeted ultrasound, photoacoustic imaging, etc. AJNMMI welcomes original and review articles on both clinical investigation and preclinical research. Occasionally, special topic issues, short communications, editorials, and invited perspectives will also be published. Manuscripts, including figures and tables, must be original and not under consideration by another journal.