The promising role of chatbots in keratorefractive surgery patient education

IF 1.2 4区医学 Q3 OPHTHALMOLOGY

Journal Francais D Ophtalmologie Pub Date : 2025-02-01 DOI:10.1016/j.jfo.2024.104381

L. Doğan , Z. Özer Özcan , İ. Edhem Yılmaz

{"title":"The promising role of chatbots in keratorefractive surgery patient education","authors":"L. Doğan , Z. Özer Özcan , İ. Edhem Yılmaz","doi":"10.1016/j.jfo.2024.104381","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To evaluate the appropriateness, understandability, actionability, and readability of responses provided by ChatGPT-3.5, Bard, and Bing Chat to frequently asked questions about keratorefractive surgery (KRS).</div></div><div><h3>Method</h3><div>Thirty-eight frequently asked questions about KRS were directed three times to a fresh ChatGPT-3.5, Bard, and Bing Chat interfaces. Two experienced refractive surgeons categorized the chatbots’ responses according to their appropriateness and the accuracy of the responses was assessed using the Structure of the Observed Learning Outcome (SOLO) taxonomy. Flesch Reading Ease (FRE) and Coleman-Liau Index (CLI) were used to evaluate the readability of the responses of chatbots. Furthermore, the understandability scores of responses were evaluated using the Patient Education Materials Assessment Tool (PEMAT).</div></div><div><h3>Results</h3><div>The appropriateness of the ChatGPT-3.5, Bard, and Bing Chat responses was 86.8% (33/38), 84.2% (32/38), and 81.5% (31/38), respectively (<em>P</em>        <0.05), and actionability (mean PEMAT-A score the ChatGPT-3.5: 62.6%, Bard: 72.4%, and Bing Chat: 60.9%, <em>P</em>  >0,05). D’après le test SOLO, ChatGPT-3.5 a obtenu la précision moyenne la plus élevée (3,91±0,44), suivi de Bard (3,64±0,61) et de Bing Chat (3,19±0,55). Pour la compréhensibilité (score moyen PEMAT-U: ChatGPT-3.5: 68,5 %, Bard: 78,6 %, et Bing Chat: 67,1 %, <em>p</em>  <0,05), Bard a obtenu de meilleurs résultats que les autres chatbots. Deux analyses de lisibilité ont montré que Bing avait la meilleure lisibilité, suivie de ChatGPT-3.5 et Bard, cependant, les scores de compréhensibilité et de lisibilité étaient plus difficiles que le niveau recommandé.</div></div><div><h3>Conclusion</h3><div>Les chatbots soutenus par l’intelligence artificielle ont le potentiel de fournir des réponses détaillées et appropriées à des niveaux acceptables en CKR. Les chatbots, bien qu’ils soient prometteurs pour l’éducation des patients en CKR, nécessitent des progrès supplémentaires, en particulier en matière de lisibilité et de compréhensibilité.</div></div>","PeriodicalId":14777,"journal":{"name":"Journal Francais D Ophtalmologie","volume":"48 2","pages":"Article 104381"},"PeriodicalIF":1.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal Francais D Ophtalmologie","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0181551224003267","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

To evaluate the appropriateness, understandability, actionability, and readability of responses provided by ChatGPT-3.5, Bard, and Bing Chat to frequently asked questions about keratorefractive surgery (KRS).

Method

Thirty-eight frequently asked questions about KRS were directed three times to a fresh ChatGPT-3.5, Bard, and Bing Chat interfaces. Two experienced refractive surgeons categorized the chatbots’ responses according to their appropriateness and the accuracy of the responses was assessed using the Structure of the Observed Learning Outcome (SOLO) taxonomy. Flesch Reading Ease (FRE) and Coleman-Liau Index (CLI) were used to evaluate the readability of the responses of chatbots. Furthermore, the understandability scores of responses were evaluated using the Patient Education Materials Assessment Tool (PEMAT).

Results

The appropriateness of the ChatGPT-3.5, Bard, and Bing Chat responses was 86.8% (33/38), 84.2% (32/38), and 81.5% (31/38), respectively (P > 0.05). According to the SOLO test, ChatGPT-3.5 (3.91 ± 0.44) achieved the highest mean accuracy and followed by Bard (3.64 ± 0.61) and Bing Chat (3.19 ± 0.55). For understandability (mean PEMAT-U score the ChatGPT-3.5: 68.5%, Bard: 78.6%, and Bing Chat: 67.1%, P < 0.05), and actionability (mean PEMAT-A score the ChatGPT-3.5: 62.6%, Bard: 72.4%, and Bing Chat: 60.9%, P < 0.05) the Bard scored better than the other chatbots. Two readability analyses showed that Bing had the highest readability, followed by the ChatGPT-3.5 and Bard, however, the understandability and readability scores were more challenging than the recommended level.

Conclusion

Artificial intelligence supported chatbots have the potential to provide detailed and appropriate responses at acceptable levels in KRS. Chatbots, while promising for patient education in KRS, require further progress, especially in readability and understandability aspects.

Objectif

Évaluer la pertinence, la compréhensibilité, l’applicabilité et la lisibilité des réponses fournies par ChatGPT-3.5, Bard et Bing Chat aux questions fréquemment posées sur la chirurgie kératoréfractive (CKR).

Méthode

Trente-huit questions fréquemment posées sur la CKR ont été adressées trois fois à des interfaces de ChatGPT-3.5, Bard et Bing Chat. Deux chirurgiens réfractifs expérimentés ont catégorisé les réponses des chatbots selon leur pertinence et l’exactitude des réponses a été évaluée à l’aide de la taxonomie SOLO (Structure of the Observed Learning Outcome). L’indice de facilité de lecture de Flesch (FRE) et l’indice de Coleman-Liau (CLI) ont été utilisés pour évaluer la lisibilité des réponses des chatbots. De plus, les scores de compréhensibilité des réponses ont été évalués à l’aide de l’outil d’évaluation des matériels pédagogiques pour les patients (PEMAT).

Résultats

La pertinence des réponses de ChatGPT-3.5, Bard et Bing Chat était respectivement de 86,8 % (33/38), 84,2 % (32/38) et 81,5 % (31/38) (p > 0,05). D’après le test SOLO, ChatGPT-3.5 a obtenu la précision moyenne la plus élevée (3,91 ± 0,44), suivi de Bard (3,64 ± 0,61) et de Bing Chat (3,19 ± 0,55). Pour la compréhensibilité (score moyen PEMAT-U: ChatGPT-3.5: 68,5 %, Bard: 78,6 %, et Bing Chat: 67,1 %, p < 0,05) et l’applicabilité (score moyen PEMAT-A: ChatGPT-3.5: 62,6 %, Bard: 72,4 %, et Bing Chat: 60,9 %, p < 0,05), Bard a obtenu de meilleurs résultats que les autres chatbots. Deux analyses de lisibilité ont montré que Bing avait la meilleure lisibilité, suivie de ChatGPT-3.5 et Bard, cependant, les scores de compréhensibilité et de lisibilité étaient plus difficiles que le niveau recommandé.

Conclusion

Les chatbots soutenus par l’intelligence artificielle ont le potentiel de fournir des réponses détaillées et appropriées à des niveaux acceptables en CKR. Les chatbots, bien qu’ils soient prometteurs pour l’éducation des patients en CKR, nécessitent des progrès supplémentaires, en particulier en matière de lisibilité et de compréhensibilité.

查看原文本刊更多论文

聊天机器人在角膜屈光手术患者教育中的重要作用。

目的：评估 ChatGPT-3.5、Bard 和 Bing Chat 针对有关角膜屈光手术（KRS）的常见问题所提供回复的适当性、可理解性、可操作性和可读性：对 ChatGPT-3.5、Bard 和 Bing Chat 界面的 38 个有关 KRS 的常见问题进行了三次引导。两名经验丰富的屈光外科医生根据聊天机器人回复的适当性进行分类，并使用观察学习结果结构 (SOLO) 分类法评估回复的准确性。弗莱什阅读容易度（FRE）和科尔曼-利亚指数（CLI）用于评估聊天机器人回复的可读性。此外，还使用患者教育材料评估工具（PEMAT）对回复的可理解性进行了评估：结果：ChatGPT-3.5、Bard 和 Bing Chat 回答的适当性分别为 86.8%（33/38）、84.2%（32/38）和 81.5%（31/38）（P>0.05）。根据 SOLO 测试，ChatGPT-3.5（3.91±0.44）的平均准确率最高，其次是 Bard（3.64±0.61）和 Bing Chat（3.19±0.55）。在可理解性方面（PEMAT-U 平均得分：ChatGPT-3.5：68.5%，Bard：78.6%，Bing Chat：67.1%，PConclusion：67.1%），Bard 和 Bing Chat 的平均得分分别为 3.91（3.91±0.44）和 3.19（3.19±0.55）：67.1%）：人工智能支持的聊天机器人有可能在 KRS 中以可接受的水平提供详细、适当的回复。虽然聊天机器人在 KRS 患者教育方面大有可为，但还需要进一步发展，尤其是在可读性和可理解性方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal Francais D Ophtalmologie 医学-眼科学

CiteScore

1.10

自引率

8.30%

发文量

317

审稿时长

49 days

期刊介绍： The Journal français d''ophtalmologie, official publication of the French Society of Ophthalmology, serves the French Speaking Community by publishing excellent research articles, communications of the French Society of Ophthalmology, in-depth reviews, position papers, letters received by the editor and a rich image bank in each issue. The scientific quality is guaranteed through unbiased peer-review, and the journal is member of the Committee of Publication Ethics (COPE). The editors strongly discourage editorial misconduct and in particular if duplicative text from published sources is identified without proper citation, the submission will not be considered for peer review and returned to the authors or immediately rejected.