Análisis de confiabilidad y lectibilidad de ChatGPT-4 y Google Gard como fuente de información del paciente para los tratamientos con radionúclidos más comúnmente aplicados en pacientes con cáncer
IF 1.6 4区 医学Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
H. Şan , Ö. Bayrakçi , B. Çağdaş , M. Serdengeçti , E. Alagöz
{"title":"Análisis de confiabilidad y lectibilidad de ChatGPT-4 y Google Gard como fuente de información del paciente para los tratamientos con radionúclidos más comúnmente aplicados en pacientes con cáncer","authors":"H. Şan , Ö. Bayrakçi , B. Çağdaş , M. Serdengeçti , E. Alagöz","doi":"10.1016/j.remn.2024.500021","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>Searching for online health information is a popular approach employed by patients to enhance their knowledge for their diseases. Recently developed AI chatbots are probably the easiest way in this regard. The purpose of the study is to analyze the reliability and readability of AI chatbot responses in terms of the most commonly applied radionuclide treatments in cancer patients.</p></div><div><h3>Methods</h3><p>Basic patient questions, thirty about RAI, PRRT and TARE treatments and twenty-nine about PSMA-TRT, were asked one by one to GPT-4 and Bard on January 2024. The reliability and readability of the responses were assessed by using DISCERN scale, Flesch Reading Ease(FRE) and Flesch-Kincaid Reading Grade Level(FKRGL).</p></div><div><h3>Results</h3><p>The mean (SD) FKRGL scores for the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 14.57 (1.19), 14.65 (1.38), 14.25 (1.10), 14.38 (1.2) and 11.49 (1.59), 12.42 (1.71), 11.35 (1.80), 13.01 (1.97), respectively. In terms of readability the FRKGL scores of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were above the general public reading grade level. The mean (SD) DISCERN scores assesses by nuclear medicine phsician for the responses of GPT-4 and Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 47.86 (5.09), 48.48 (4.22), 46.76 (4.09), 48.33 (5.15) and 51.50 (5.64), 53.44 (5.42), 53 (6.36), 49.43 (5.32), respectively. Based on mean DISCERN scores, the reliability of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT, and TARE treatments ranged from fair to good. The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of GPT-4 about RAI, PSMA-TRT, PRRT and TARE treatments were 0.512 (95% CI 0.296: 0.704), 0.695 (95% CI 0.518: 0.829), 0.687 (95% CI 0.511: 0.823) and 0.649 (95% CI 0.462: 0.798), respectively (<em>P</em><.01). The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 0.753 (95% CI 0.602: 0.863), 0.812 (95% CI 0.686: 0.899), 0.804 (95% CI 0.677: 0.894) and 0.671 (95% CI 0.489: 0.812), respectively (<em>P</em><.01). The inter-rater reliability for the responses of Bard and GPT-4 about RAİ, PSMA-TRT, PRRT and TARE treatments were moderate to good. Further, consulting to the nuclear medicine physician was rarely emphasized both in GPT-4 and Google Bard and references were included in some responses of Google Bard, but there were no references in GPT-4.</p></div><div><h3>Conclusion</h3><p>Although the information provided by AI chatbots may be acceptable in medical terms, it can not be easy to read for the general public, which may prevent it from being understandable. Effective prompts using ‘prompt engineering’ may refine the responses in a more comprehensible manner. Since radionuclide treatments are specific to nuclear medicine expertise, nuclear medicine physician need to be stated as a consultant in responses in order to guide patients and caregivers to obtain accurate medical advice. Referencing is significant in terms of confidence and satisfaction of patients and caregivers seeking information.</p></div>","PeriodicalId":48986,"journal":{"name":"Revista Espanola De Medicina Nuclear E Imagen Molecular","volume":"43 4","pages":"Article 500021"},"PeriodicalIF":1.6000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Espanola De Medicina Nuclear E Imagen Molecular","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2253654X24000295","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Searching for online health information is a popular approach employed by patients to enhance their knowledge for their diseases. Recently developed AI chatbots are probably the easiest way in this regard. The purpose of the study is to analyze the reliability and readability of AI chatbot responses in terms of the most commonly applied radionuclide treatments in cancer patients.
Methods
Basic patient questions, thirty about RAI, PRRT and TARE treatments and twenty-nine about PSMA-TRT, were asked one by one to GPT-4 and Bard on January 2024. The reliability and readability of the responses were assessed by using DISCERN scale, Flesch Reading Ease(FRE) and Flesch-Kincaid Reading Grade Level(FKRGL).
Results
The mean (SD) FKRGL scores for the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 14.57 (1.19), 14.65 (1.38), 14.25 (1.10), 14.38 (1.2) and 11.49 (1.59), 12.42 (1.71), 11.35 (1.80), 13.01 (1.97), respectively. In terms of readability the FRKGL scores of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT and TARE treatments were above the general public reading grade level. The mean (SD) DISCERN scores assesses by nuclear medicine phsician for the responses of GPT-4 and Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 47.86 (5.09), 48.48 (4.22), 46.76 (4.09), 48.33 (5.15) and 51.50 (5.64), 53.44 (5.42), 53 (6.36), 49.43 (5.32), respectively. Based on mean DISCERN scores, the reliability of the responses of GPT-4 and Google Bard about RAI, PSMA-TRT, PRRT, and TARE treatments ranged from fair to good. The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of GPT-4 about RAI, PSMA-TRT, PRRT and TARE treatments were 0.512 (95% CI 0.296: 0.704), 0.695 (95% CI 0.518: 0.829), 0.687 (95% CI 0.511: 0.823) and 0.649 (95% CI 0.462: 0.798), respectively (P<.01). The inter-rater reliability correlation coefficient of DISCERN scores assessed by GPT-4, Bard and nuclear medicine physician for the responses of Bard about RAI, PSMA-TRT, PRRT and TARE treatments were 0.753 (95% CI 0.602: 0.863), 0.812 (95% CI 0.686: 0.899), 0.804 (95% CI 0.677: 0.894) and 0.671 (95% CI 0.489: 0.812), respectively (P<.01). The inter-rater reliability for the responses of Bard and GPT-4 about RAİ, PSMA-TRT, PRRT and TARE treatments were moderate to good. Further, consulting to the nuclear medicine physician was rarely emphasized both in GPT-4 and Google Bard and references were included in some responses of Google Bard, but there were no references in GPT-4.
Conclusion
Although the information provided by AI chatbots may be acceptable in medical terms, it can not be easy to read for the general public, which may prevent it from being understandable. Effective prompts using ‘prompt engineering’ may refine the responses in a more comprehensible manner. Since radionuclide treatments are specific to nuclear medicine expertise, nuclear medicine physician need to be stated as a consultant in responses in order to guide patients and caregivers to obtain accurate medical advice. Referencing is significant in terms of confidence and satisfaction of patients and caregivers seeking information.
期刊介绍:
The Revista Española de Medicina Nuclear e Imagen Molecular (Spanish Journal of Nuclear Medicine and Molecular Imaging), was founded in 1982, and is the official journal of the Spanish Society of Nuclear Medicine and Molecular Imaging, which has more than 700 members.
The Journal, which publishes 6 regular issues per year, has the promotion of research and continuing education in all fields of Nuclear Medicine as its main aim. For this, its principal sections are Originals, Clinical Notes, Images of Interest, and Special Collaboration articles.