Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.

Journal of Crohn's & colitis Pub Date : 2024-08-14 DOI:10.1093/ecco-jcc/jjae040

Martina Sciberras, Yvette Farrugia, Hannah Gordon, Federica Furfaro, Mariangela Allocca, Joana Torres, Naila Arebi, Gionata Fiorino, Marietta Iacucci, Bram Verstockt, Fernando Magro, Kostas Katsanos, Josef Busuttil, Katya De Giovanni, Valerie Anne Fenech, Stefania Chetcuti Zammit, Pierre Ellul

{"title":"Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.","authors":"Martina Sciberras, Yvette Farrugia, Hannah Gordon, Federica Furfaro, Mariangela Allocca, Joana Torres, Naila Arebi, Gionata Fiorino, Marietta Iacucci, Bram Verstockt, Fernando Magro, Kostas Katsanos, Josef Busuttil, Katya De Giovanni, Valerie Anne Fenech, Stefania Chetcuti Zammit, Pierre Ellul","doi":"10.1093/ecco-jcc/jjae040","DOIUrl":null,"url":null,"abstract":"Background: As acceptance of artificial intelligence [AI] platforms increases, more patients will consider these tools as sources of information. The ChatGPT architecture utilizes a neural network to process natural language, thus generating responses based on the context of input text. The accuracy and completeness of ChatGPT3.5 in the context of inflammatory bowel disease [IBD] remains unclear.Methods: In this prospective study, 38 questions worded by IBD patients were inputted into ChatGPT3.5. The following topics were covered: [1] Crohn's disease [CD], ulcerative colitis [UC], and malignancy; [2] maternal medicine; [3] infection and vaccination; and [4] complementary medicine. Responses given by ChatGPT were assessed for accuracy [1-completely incorrect to 5-completely correct] and completeness [3-point Likert scale; range 1-incomplete to 3-complete] by 14 expert gastroenterologists, in comparison with relevant ECCO guidelines.Results: In terms of accuracy, most replies [84.2%] had a median score of ≥4 (interquartile range [IQR]: 2) and a mean score of 3.87 [SD: ±0.6]. For completeness, 34.2% of the replies had a median score of 3 and 55.3% had a median score of between 2 and <3. Overall, the mean rating was 2.24 [SD: ±0.4, median: 2, IQR: 1]. Though groups 3 and 4 had a higher mean for both accuracy and completeness, there was no significant scoring variation between the four question groups [Kruskal-Wallis test p > 0.05]. However, statistical analysis for the different individual questions revealed a significant difference for both accuracy [p < 0.001] and completeness [p < 0.001]. The questions which rated the highest for both accuracy and completeness were related to smoking, while the lowest rating was related to screening for malignancy and vaccinations especially in the context of immunosuppression and family planning.Conclusion: This is the first study to demonstrate the capability of an AI-based system to provide accurate and comprehensive answers to real-world patient queries in IBD. AI systems may serve as a useful adjunct for patients, in addition to standard of care in clinics and validated patient information resources. However, responses in specialist areas may deviate from evidence-based guidance and the replies need to give more firm advice.","PeriodicalId":94074,"journal":{"name":"Journal of Crohn's & colitis","volume":" ","pages":"1215-1221"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Crohn's & colitis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ecco-jcc/jjae040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: As acceptance of artificial intelligence [AI] platforms increases, more patients will consider these tools as sources of information. The ChatGPT architecture utilizes a neural network to process natural language, thus generating responses based on the context of input text. The accuracy and completeness of ChatGPT3.5 in the context of inflammatory bowel disease [IBD] remains unclear.

Methods: In this prospective study, 38 questions worded by IBD patients were inputted into ChatGPT3.5. The following topics were covered: [1] Crohn's disease [CD], ulcerative colitis [UC], and malignancy; [2] maternal medicine; [3] infection and vaccination; and [4] complementary medicine. Responses given by ChatGPT were assessed for accuracy [1-completely incorrect to 5-completely correct] and completeness [3-point Likert scale; range 1-incomplete to 3-complete] by 14 expert gastroenterologists, in comparison with relevant ECCO guidelines.

Results: In terms of accuracy, most replies [84.2%] had a median score of ≥4 (interquartile range [IQR]: 2) and a mean score of 3.87 [SD: ±0.6]. For completeness, 34.2% of the replies had a median score of 3 and 55.3% had a median score of between 2 and <3. Overall, the mean rating was 2.24 [SD: ±0.4, median: 2, IQR: 1]. Though groups 3 and 4 had a higher mean for both accuracy and completeness, there was no significant scoring variation between the four question groups [Kruskal-Wallis test p > 0.05]. However, statistical analysis for the different individual questions revealed a significant difference for both accuracy [p < 0.001] and completeness [p < 0.001]. The questions which rated the highest for both accuracy and completeness were related to smoking, while the lowest rating was related to screening for malignancy and vaccinations especially in the context of immunosuppression and family planning.

Conclusion: This is the first study to demonstrate the capability of an AI-based system to provide accurate and comprehensive answers to real-world patient queries in IBD. AI systems may serve as a useful adjunct for patients, in addition to standard of care in clinics and validated patient information resources. However, responses in specialist areas may deviate from evidence-based guidance and the replies need to give more firm advice.

查看原文本刊更多论文

ChatGPT 为炎症性肠病患者提供的与 ECCO 指南相关信息的准确性。

导言：随着人们对人工智能平台接受度的提高，越来越多的患者将这些工具视为信息来源。ChatGPT 架构利用神经网络处理自然语言，从而根据输入文本的上下文生成回复。目前尚不清楚 ChatGPT3.5 在炎症性肠病方面的准确性和完整性：在这项前瞻性研究中，38 个由 IBD 患者提出的问题被输入到 ChatGPT3.5 中。涉及以下主题：1）CD、UC 和恶性肿瘤；2）孕产妇医学；3）感染和疫苗接种；4）补充医学。14 位肠胃病专家对照 ECCO 的相关指南，对 Chat GPT 所作回答的准确性（1 分-完全错误到 5 分-完全正确）和完整性（3 分李克特量表；范围 1 分-不完整到 3 分-完整）进行了评估：在准确性方面，大多数答复（84.2%）的中位数得分≥4（IQR：2），平均得分为 3.87（SD：+/- 0.6）。在完整性方面，34.2% 的答复的中位数为 3 分，55.3% 的答复的中位数在 2 分至 0.05 分之间。）然而，对不同单个问题的统计分析显示，在准确性方面存在显著差异（p 结论：这是第一项证明基于人工智能的系统有能力为 IBD 患者提供准确、全面的真实世界查询答案的研究。除了临床标准护理和经过验证的患者信息资源外，人工智能系统还可作为患者的有用辅助工具。然而，专业领域的答复可能会偏离循证指导，因此答复需要给出更明确的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Crohn's & colitis

自引率

0.00%

发文量