Piotr Strzalkowski, Alicja Strzalkowska, Jay Chhablani, Kristina Pfau, Marie-Hélène Errera, Mathias Roth, Friederike Schaub, Nikolaos E Bechrakis, Hans Hoerauf, Constantin Reiter, Alexander K Schuster, Gerd Geerling, Rainer Guthoff
{"title":"评估 ChatGPT-4 和 Google Gemini 在提供视网膜脱离信息方面的准确性和可读性:一项多中心专家比较研究。","authors":"Piotr Strzalkowski, Alicja Strzalkowska, Jay Chhablani, Kristina Pfau, Marie-Hélène Errera, Mathias Roth, Friederike Schaub, Nikolaos E Bechrakis, Hans Hoerauf, Constantin Reiter, Alexander K Schuster, Gerd Geerling, Rainer Guthoff","doi":"10.1186/s40942-024-00579-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment.</p><p><strong>Methods: </strong>Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts.</p><p><strong>Results: </strong>Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased.</p><p><strong>Conclusions: </strong>ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.</p>","PeriodicalId":14289,"journal":{"name":"International Journal of Retina and Vitreous","volume":"10 1","pages":"61"},"PeriodicalIF":1.9000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367851/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.\",\"authors\":\"Piotr Strzalkowski, Alicja Strzalkowska, Jay Chhablani, Kristina Pfau, Marie-Hélène Errera, Mathias Roth, Friederike Schaub, Nikolaos E Bechrakis, Hans Hoerauf, Constantin Reiter, Alexander K Schuster, Gerd Geerling, Rainer Guthoff\",\"doi\":\"10.1186/s40942-024-00579-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment.</p><p><strong>Methods: </strong>Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts.</p><p><strong>Results: </strong>Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased.</p><p><strong>Conclusions: </strong>ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.</p>\",\"PeriodicalId\":14289,\"journal\":{\"name\":\"International Journal of Retina and Vitreous\",\"volume\":\"10 1\",\"pages\":\"61\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367851/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Retina and Vitreous\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40942-024-00579-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Retina and Vitreous","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40942-024-00579-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
背景:大型语言模型(LLM),如 ChatGPT-4 和 Google Gemini,在患者健康教育方面显示出了潜力,但对其准确性的担忧需要仔细评估。本研究评估了 ChatGPT-4 和 Google Gemini 在回答视网膜脱离问题时的可读性和准确性:比较研究分析 ChatGPT-4 和 Google Gemini 对 13 个视网膜脱离问题的回答,按难度级别(D1、D2、D3)分类。十位玻璃体视网膜专家对蒙面回答进行了审查,并就正确性、错误、主题准确性、连贯性和总体质量分级进行了评分。分析包括弗莱什易读性评分、字数和句数:结果:两种人工智能工具的所有难度级别都需要大学水平的理解能力。谷歌双子座更容易理解(p = 0.03),而 ChatGPT-4 在较难的问题上提供了更多正确答案(p = 0.0005),且严重错误较少。ChatGPT-4 在最具挑战性的问题上得分最高,显示出更高的主题准确性(p = 0.003)。ChatGPT-4 在 13 个问题中的 8 个问题上表现优于 Google Gemini,在最简单(p = 0.03)和最困难(p = 0.0002)的问题上总体质量等级较高,随着问题难度的增加,等级也随之降低:结论:ChatGPT-4 和谷歌双子座能有效解决有关视网膜脱离的问题,提供的答案大多准确,很少出现关键性错误,但患者需要接受更高的教育才能理解。人工智能工具的应用可快速提供准确、相关的医疗保健信息,有助于改善医疗服务。
Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.
Background: Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment.
Methods: Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts.
Results: Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased.
Conclusions: ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.
期刊介绍:
International Journal of Retina and Vitreous focuses on the ophthalmic subspecialty of vitreoretinal disorders. The journal presents original articles on new approaches to diagnosis, outcomes of clinical trials, innovations in pharmacological therapy and surgical techniques, as well as basic science advances that impact clinical practice. Topical areas include, but are not limited to: -Imaging of the retina, choroid and vitreous -Innovations in optical coherence tomography (OCT) -Small-gauge vitrectomy, retinal detachment, chromovitrectomy -Electroretinography (ERG), microperimetry, other functional tests -Intraocular tumors -Retinal pharmacotherapy & drug delivery -Diabetic retinopathy & other vascular diseases -Age-related macular degeneration (AMD) & other macular entities