Evaluation of ChatGPT's Usefulness and Accuracy in Diagnostic Surgical Pathology.

medRxiv - Pathology Pub Date : 2024-03-13 DOI:10.1101/2024.03.12.24304153

Vincenzo Guastafierro, Devin Nicole Corbitt, Alessandra Bressan, Bethania Fernandes, Ömer Mintemur, Francesca Magnoli, Susanna Ronchi, Stefano La Rosa, Silvia Uccella, Salvatore Lorenzo Renne

{"title":"Evaluation of ChatGPT's Usefulness and Accuracy in Diagnostic Surgical Pathology.","authors":"Vincenzo Guastafierro, Devin Nicole Corbitt, Alessandra Bressan, Bethania Fernandes, Ömer Mintemur, Francesca Magnoli, Susanna Ronchi, Stefano La Rosa, Silvia Uccella, Salvatore Lorenzo Renne","doi":"10.1101/2024.03.12.24304153","DOIUrl":null,"url":null,"abstract":"ChatGPT is an artificial intelligence capable of processing and generating human-like language. ChatGPT's role within clinical patient care and medical education has been explored; however, assessment of its potential in supporting histopathological diagnosis is lacking. In this study, we assessed ChatGPT's reliability in addressing pathology-related diagnostic questions across 10 subspecialties, as well as its ability to provide scientific references. We created five clinico-pathological scenarios for each subspecialty, posed to ChatGPT as open-ended or multiple-choice questions. Each question either asked for scientific references or not. Outputs were assessed by six pathologists according to: 1) usefulness in supporting the diagnosis and 2) absolute number of errors. All references were manually verified. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality and pathologist evaluation. Overall, we yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases. 32.1% of outputs contained no errors, while the remaining contained at least one error (maximum 18). ChatGPT provided 214 bibliographic references: 70.1% were correct, 12.1% were inaccurate and 17.8% did not correspond to a publication. Scenario variability had the greatest impact on ratings, followed by prompting strategy. Finally, latent knowledge across the fields showed minimal variation. In conclusion, ChatGPT provided useful responses in one-third of cases, but the number of errors and variability highlight that it is not yet adequate for everyday diagnostic practice and should be used with discretion as a support tool. The lack of thoroughness in providing references also suggests caution should be employed even when used as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data and experience for the intricate task of histopathological diagnosis.","PeriodicalId":501528,"journal":{"name":"medRxiv - Pathology","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Pathology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.03.12.24304153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

ChatGPT is an artificial intelligence capable of processing and generating human-like language. ChatGPT's role within clinical patient care and medical education has been explored; however, assessment of its potential in supporting histopathological diagnosis is lacking. In this study, we assessed ChatGPT's reliability in addressing pathology-related diagnostic questions across 10 subspecialties, as well as its ability to provide scientific references. We created five clinico-pathological scenarios for each subspecialty, posed to ChatGPT as open-ended or multiple-choice questions. Each question either asked for scientific references or not. Outputs were assessed by six pathologists according to: 1) usefulness in supporting the diagnosis and 2) absolute number of errors. All references were manually verified. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality and pathologist evaluation. Overall, we yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases. 32.1% of outputs contained no errors, while the remaining contained at least one error (maximum 18). ChatGPT provided 214 bibliographic references: 70.1% were correct, 12.1% were inaccurate and 17.8% did not correspond to a publication. Scenario variability had the greatest impact on ratings, followed by prompting strategy. Finally, latent knowledge across the fields showed minimal variation. In conclusion, ChatGPT provided useful responses in one-third of cases, but the number of errors and variability highlight that it is not yet adequate for everyday diagnostic practice and should be used with discretion as a support tool. The lack of thoroughness in providing references also suggests caution should be employed even when used as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data and experience for the intricate task of histopathological diagnosis.

查看原文本刊更多论文

评估 ChatGPT 在外科病理诊断中的实用性和准确性。

ChatGPT 是一种人工智能，能够处理和生成类似人类的语言。人们已经探索了 ChatGPT 在临床病人护理和医学教育中的作用，但对其在支持组织病理学诊断方面的潜力还缺乏评估。在这项研究中，我们评估了 ChatGPT 解决 10 个亚专科病理相关诊断问题的可靠性，以及它提供科学参考资料的能力。我们为每个亚专科创建了五个临床病理场景，以开放式或多项选择题的形式向 ChatGPT 提出。每个问题都要求或不要求提供科学参考文献。由六位病理学家根据以下标准对结果进行评估：1）支持诊断的有用性；2）错误的绝对数量。所有参考文献均经过人工验证。我们使用有向无环图和结构因果模型来确定每种情景类型、领域、提问方式和病理学家评价的影响。总体而言，我们获得了 894 项评价。ChatGPT 在 62.2% 的情况下提供了有用的答案。32.1% 的输出不包含错误，其余的输出至少包含一个错误（最多 18 个）。ChatGPT 提供了 214 条参考文献：其中 70.1% 正确，12.1% 不准确，17.8% 与出版物不符。情景变化对评分的影响最大，其次是提示策略。最后，各领域的潜在知识差异极小。总之，ChatGPT 在三分之一的情况下提供了有用的回答，但错误的数量和可变性突出表明它还不能满足日常诊断实践的需要，应谨慎用作辅助工具。在提供参考资料方面的欠缺也表明，即使将其用作自学工具，也应谨慎使用。必须认识到人类专家在综合图像、临床数据和经验以完成复杂的组织病理学诊断任务方面所发挥的不可替代的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv - Pathology

自引率

0.00%

发文量