Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance.

Andrew Y Wang, Sherman Lin, Christopher Tran, Robert J Homer, Dan Wilsdon, Joanna C Walsh, Emily A Goebel, Irene Sansano, Snehal Sonawane, Vincent Cockenpot, Sanjay Mukhopadhyay, Toros Taskin, Nusrat Zahra, Luca Cima, Orhan Semerci, Birsen Gizem Özamrak, Pallavi Mishra, Naga Sarika Vennavalli, Po-Hsuan Cameron Chen, Matthew J Cecchini
{"title":"Assessment of Pathology Domain-Specific Knowledge of ChatGPT and Comparison to Human Performance.","authors":"Andrew Y Wang, Sherman Lin, Christopher Tran, Robert J Homer, Dan Wilsdon, Joanna C Walsh, Emily A Goebel, Irene Sansano, Snehal Sonawane, Vincent Cockenpot, Sanjay Mukhopadhyay, Toros Taskin, Nusrat Zahra, Luca Cima, Orhan Semerci, Birsen Gizem Özamrak, Pallavi Mishra, Naga Sarika Vennavalli, Po-Hsuan Cameron Chen, Matthew J Cecchini","doi":"10.5858/arpa.2023-0296-OA","DOIUrl":null,"url":null,"abstract":"<p><strong>Context.—: </strong>Artificial intelligence algorithms hold the potential to fundamentally change many aspects of society. Application of these tools, including the publicly available ChatGPT, has demonstrated impressive domain-specific knowledge in many areas, including medicine.</p><p><strong>Objectives.—: </strong>To understand the level of pathology domain-specific knowledge for ChatGPT using different underlying large language models, GPT-3.5 and the updated GPT-4.</p><p><strong>Design.—: </strong>An international group of pathologists (n = 15) was recruited to generate pathology-specific questions at a similar level to those that could be seen on licensing (board) examinations. The questions (n = 15) were answered by GPT-3.5, GPT-4, and a staff pathologist who recently passed their Canadian pathology licensing exams. Participants were instructed to score answers on a 5-point scale and to predict which answer was written by ChatGPT.</p><p><strong>Results.—: </strong>GPT-3.5 performed at a similar level to the staff pathologist, while GPT-4 outperformed both. The overall score for both GPT-3.5 and GPT-4 was within the range of meeting expectations for a trainee writing licensing examinations. In all but one question, the reviewers were able to correctly identify the answers generated by GPT-3.5.</p><p><strong>Conclusions.—: </strong>By demonstrating the ability of ChatGPT to answer pathology-specific questions at a level similar to (GPT-3.5) or exceeding (GPT-4) a trained pathologist, this study highlights the potential of large language models to be transformative in this space. In the future, more advanced iterations of these algorithms with increased domain-specific knowledge may have the potential to assist pathologists and enhance pathology resident training.</p>","PeriodicalId":93883,"journal":{"name":"Archives of pathology & laboratory medicine","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of pathology & laboratory medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5858/arpa.2023-0296-OA","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Context.—: Artificial intelligence algorithms hold the potential to fundamentally change many aspects of society. Application of these tools, including the publicly available ChatGPT, has demonstrated impressive domain-specific knowledge in many areas, including medicine.

Objectives.—: To understand the level of pathology domain-specific knowledge for ChatGPT using different underlying large language models, GPT-3.5 and the updated GPT-4.

Design.—: An international group of pathologists (n = 15) was recruited to generate pathology-specific questions at a similar level to those that could be seen on licensing (board) examinations. The questions (n = 15) were answered by GPT-3.5, GPT-4, and a staff pathologist who recently passed their Canadian pathology licensing exams. Participants were instructed to score answers on a 5-point scale and to predict which answer was written by ChatGPT.

Results.—: GPT-3.5 performed at a similar level to the staff pathologist, while GPT-4 outperformed both. The overall score for both GPT-3.5 and GPT-4 was within the range of meeting expectations for a trainee writing licensing examinations. In all but one question, the reviewers were able to correctly identify the answers generated by GPT-3.5.

Conclusions.—: By demonstrating the ability of ChatGPT to answer pathology-specific questions at a level similar to (GPT-3.5) or exceeding (GPT-4) a trained pathologist, this study highlights the potential of large language models to be transformative in this space. In the future, more advanced iterations of these algorithms with increased domain-specific knowledge may have the potential to assist pathologists and enhance pathology resident training.

评估 ChatGPT 的病理学领域特定知识并与人类表现进行比较。
背景:人工智能算法有可能从根本上改变社会的许多方面。这些工具的应用,包括公开的 ChatGPT,已经在包括医学在内的许多领域展示了令人印象深刻的特定领域知识:使用不同的底层大型语言模型(GPT-3.5 和更新版 GPT-4)了解 ChatGPT 的病理学特定领域知识水平:设计:招募了一组国际病理学家(n = 15)来生成病理学特定问题,这些问题的水平与执照(委员会)考试中可能出现的问题水平相似。这些问题(n = 15)由 GPT-3.5、GPT-4 和一位最近通过加拿大病理学执业资格考试的病理学家回答。参与者被要求按 5 分制打分,并预测哪个答案是由 ChatGPT.Results.- 撰写的:结果:GPT-3.5 的表现与病理学家相似,而 GPT-4 的表现则优于两者。GPT-3.5 和 GPT-4 的总成绩都在达到编写执业资格考试试卷的受训人员的预期范围之内。除一道题外,评审人员都能正确识别 GPT-3.5.Conclusions.- 生成的答案:通过展示 ChatGPT 回答病理学特定问题的能力,其水平接近(GPT-3.5)或超过(GPT-4)训练有素的病理学家,本研究强调了大型语言模型在这一领域的变革潜力。未来,这些算法的更高级迭代与特定领域知识的增加可能会为病理学家提供帮助,并加强病理住院医师的培训。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信