Análisis de confiabilidad y lectibilidad de ChatGPT-4 y Google Gard como fuente de información del paciente para los tratamientos con radionúclidos más comúnmente aplicados en pacientes con cáncer

IF 1.6 4区医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Revista Espanola De Medicina Nuclear E Imagen Molecular Pub Date : 2024-07-01 DOI:10.1016/j.remn.2024.500021

H. Şan , Ö. Bayrakçi , B. Çağdaş , M. Serdengeçti , E. Alagöz

{"title":"Análisis de confiabilidad y lectibilidad de ChatGPT-4 y Google Gard como fuente de información del paciente para los tratamientos con radionúclidos más comúnmente aplicados en pacientes con cáncer","authors":"H. Şan , Ö. Bayrakçi , B. Çağdaş , M. Serdengeçti , E. Alagöz","doi":"10.1016/j.remn.2024.500021","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>Searching for online health information is a popular approach employed by patients to enhance their knowledge for their diseases. Recently developed AI chatbots are probably the easiest way in this regard. The purpose of the study is to analyze the reliability and readability of AI chatbot responses in terms of the most commonly applied radionuclide treatments in cancer patients.</p></div><div><h3>Methods</h3><p>Basic patient questions, thirty about RAI, PRRT and TARE treatments and twenty-nine about PSMA-TRT, were asked one by one to GPT-4 and Bard on January 2024. The reliability and readability of the responses were assessed by using DISCERN scale, Flesch Reading Ease(FRE) and Flesch-Kincaid Reading Grade Level(FKRGL).</p></div><div><h3>Results</h3><p>The mean (SD) FKRGL scores for the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 14.57 (1.19), 14.65 (1.38), 14.25 (1.10), 14.38 (1.2) and 11.49 (1.59), 12.42 (1.71), 11.35 (1.80), 13.01 (1.97), respectively. In terms of readability the FRKGL scores of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were above the general public reading grade level. The mean (SD) DISCERN scores assesses by nuclear medicine phsician for the responses of GPT-4 and Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 47.86 (5.09), 48.48 (4.22), 46.76 (4.09), 48.33 (5.15) and 51.50 (5.64), 53.44 (5.42), 53 (6.36), 49.43 (5.32), respectively. Based on mean DISCERN scores, the reliability of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT, and TARE treatments ranged from fair to good. The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of GPT-4 about RAI, PSMA-TRT, PRRT and TARE treatments were 0.512 (95% CI 0.296: 0.704), 0.695 (95% CI 0.518: 0.829), 0.687 (95% CI 0.511: 0.823) and 0.649 (95% CI 0.462: 0.798), respectively (<em>P</em><.01). The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 0.753 (95% CI 0.602: 0.863), 0.812 (95% CI 0.686: 0.899), 0.804 (95% CI 0.677: 0.894) and 0.671 (95% CI 0.489: 0.812), respectively (<em>P</em><.01). The inter-rater reliability for the responses of Bard and GPT-4 about RAİ, PSMA-TRT, PRRT and TARE treatments were moderate to good. Further, consulting to the nuclear medicine physician was rarely emphasized both in GPT-4 and Google Bard and references were included in some responses of Google Bard, but there were no references in GPT-4.</p></div><div><h3>Conclusion</h3><p>Although the information provided by AI chatbots may be acceptable in medical terms, it can not be easy to read for the general public, which may prevent it from being understandable. Effective prompts using ‘prompt engineering’ may refine the responses in a more comprehensible manner. Since radionuclide treatments are specific to nuclear medicine expertise, nuclear medicine physician need to be stated as a consultant in responses in order to guide patients and caregivers to obtain accurate medical advice. Referencing is significant in terms of confidence and satisfaction of patients and caregivers seeking information.</p></div>","PeriodicalId":48986,"journal":{"name":"Revista Espanola De Medicina Nuclear E Imagen Molecular","volume":"43 4","pages":"Article 500021"},"PeriodicalIF":1.6000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Espanola De Medicina Nuclear E Imagen Molecular","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2253654X24000295","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Searching for online health information is a popular approach employed by patients to enhance their knowledge for their diseases. Recently developed AI chatbots are probably the easiest way in this regard. The purpose of the study is to analyze the reliability and readability of AI chatbot responses in terms of the most commonly applied radionuclide treatments in cancer patients.

Methods

Basic patient questions, thirty about RAI, PRRT and TARE treatments and twenty-nine about PSMA-TRT, were asked one by one to GPT-4 and Bard on January 2024. The reliability and readability of the responses were assessed by using DISCERN scale, Flesch Reading Ease(FRE) and Flesch-Kincaid Reading Grade Level(FKRGL).

Results

The mean (SD) FKRGL scores for the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 14.57 (1.19), 14.65 (1.38), 14.25 (1.10), 14.38 (1.2) and 11.49 (1.59), 12.42 (1.71), 11.35 (1.80), 13.01 (1.97), respectively. In terms of readability the FRKGL scores of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were above the general public reading grade level. The mean (SD) DISCERN scores assesses by nuclear medicine phsician for the responses of GPT-4 and Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 47.86 (5.09), 48.48 (4.22), 46.76 (4.09), 48.33 (5.15) and 51.50 (5.64), 53.44 (5.42), 53 (6.36), 49.43 (5.32), respectively. Based on mean DISCERN scores, the reliability of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT, and TARE treatments ranged from fair to good. The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of GPT-4 about RAI, PSMA-TRT, PRRT and TARE treatments were 0.512 (95% CI 0.296: 0.704), 0.695 (95% CI 0.518: 0.829), 0.687 (95% CI 0.511: 0.823) and 0.649 (95% CI 0.462: 0.798), respectively (P<.01). The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 0.753 (95% CI 0.602: 0.863), 0.812 (95% CI 0.686: 0.899), 0.804 (95% CI 0.677: 0.894) and 0.671 (95% CI 0.489: 0.812), respectively (P<.01). The inter-rater reliability for the responses of Bard and GPT-4 about RAİ, PSMA-TRT, PRRT and TARE treatments were moderate to good. Further, consulting to the nuclear medicine physician was rarely emphasized both in GPT-4 and Google Bard and references were included in some responses of Google Bard, but there were no references in GPT-4.

Conclusion

Although the information provided by AI chatbots may be acceptable in medical terms, it can not be easy to read for the general public, which may prevent it from being understandable. Effective prompts using ‘prompt engineering’ may refine the responses in a more comprehensible manner. Since radionuclide treatments are specific to nuclear medicine expertise, nuclear medicine physician need to be stated as a consultant in responses in order to guide patients and caregivers to obtain accurate medical advice. Referencing is significant in terms of confidence and satisfaction of patients and caregivers seeking information.

查看原文本刊更多论文

将 ChatGPT-4 和 Google Gard 作为癌症患者最常用放射性核素治疗的患者信息来源的可靠性和可讲性分析。

目的搜索在线健康信息是患者常用的一种方法，以增强他们对疾病的了解。最近开发的人工智能聊天机器人可能是这方面最简单的方法。本研究的目的是分析人工智能聊天机器人就癌症患者最常使用的放射性核素治疗方法所做回答的可靠性和可读性。方法在 2024 年 1 月向 GPT-4 和 Bard 逐一询问了患者的基本问题，其中 30 个是关于 RAI、PRRT 和 TARE 治疗的，29 个是关于 PSMA-TRT 的。采用 DISCERN 量表、Flesch Reading Ease（FRE）和 Flesch-Kincaid Reading Grade Level（FKRGL）对回答的可靠性和可读性进行了评估。结果 GPT-4 和 Google Bard 中关于 RAI、PSMA-TRT、PRRT 和 TARE 治疗的回答的 FKRGL 平均得分（标清）分别为 14.57（1.19）、14.65（1.38）、14.25（1.10）、14.38（1.2）和 11.49（1.59）、12.42（1.71）、11.35（1.80）、13.01（1.97）。就可读性而言，关于 RAI、PSMA-TRT、PRRT 和 TARE 治疗的 GPT-4 和 Google Bard 的 FRKGL 分数高于一般公众的阅读水平。核医学医生对 GPT-4 和谷歌巴德关于 RAI、PSMA-TRT、PRRT 和 TARE 治疗的回答进行评估后得出的 DISCERN 平均分（标度）分别为 47.86（5.09）、48.48（4.22）、46.76（4.09）、48.33（5.15）和 51.50（5.64）、53.44（5.42）、53（6.36）、49.43（5.32）。根据 DISCERN 平均得分，GPT-4 和 Google Bard 关于 RAI、PSMA-TRT、PRRT 和 TARE 治疗的回答的可靠性从一般到良好不等。由 GPT-4、谷歌巴德和核医学医生对 GPT-4 关于 RAI、PSMA-TRT、PRRT 和 TARE 治疗的回答所评估的 DISCERN 分数的评分者间可靠性相关系数为 0.512（95% CI 0.296：0.704）、0.695（95% CI 0.518：0.829）、0.687（95% CI 0.511：0.823）和 0.649（95% CI 0.462：0.798）（P<.01）。由 GPT-4、Bard 和核医学医生评估的 Bard 关于 RAI、PSMA-TRT、PRRT 和 TARE 治疗的 DISCERN 评分的评分者间可靠性相关系数分别为 0.753（95% CI 0.602：0.863）、0.812（95% CI 0.686：0.899）、0.804（95% CI 0.677：0.894）和 0.671（95% CI 0.489：0.812）（P<.01）。Bard和GPT-4对RAİ、PSMA-TRT、PRRT和TARE治疗的反应的评分者间可靠性为中等至良好。此外，GPT-4 和 Google Bard 很少强调向核医学医生咨询，Google Bard 的一些回答中包含了参考文献，但 GPT-4 中没有参考文献。使用 "提示工程 "进行有效提示可能会以更易于理解的方式完善回复。由于放射性核素治疗是核医学专业知识的特定内容，因此需要在回答中说明核医学医生是顾问，以指导患者和护理人员获得准确的医疗建议。就患者和护理人员寻求信息的信心和满意度而言，参考意义重大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Revista Espanola De Medicina Nuclear E Imagen Molecular RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

1.10

自引率

16.70%

发文量

审稿时长

24 days

期刊介绍： The Revista Española de Medicina Nuclear e Imagen Molecular (Spanish Journal of Nuclear Medicine and Molecular Imaging), was founded in 1982, and is the official journal of the Spanish Society of Nuclear Medicine and Molecular Imaging, which has more than 700 members. The Journal, which publishes 6 regular issues per year, has the promotion of research and continuing education in all fields of Nuclear Medicine as its main aim. For this, its principal sections are Originals, Clinical Notes, Images of Interest, and Special Collaboration articles.