Artificial intelligence chatbot vs pathology faculty and residents: Real-world clinical questions from a genitourinary treatment planning conference.

IF 2.3 4区医学 Q2 PATHOLOGY

American journal of clinical pathology Pub Date : 2024-06-28 DOI:10.1093/ajcp/aqae078

Matthew X Luo, Adam Lyle, Phillip Bennett, Daniel Albertson, Deepika Sirohi, Benjamin L Maughan, Valarie McMurtry, Jonathon Mahlow

{"title":"Artificial intelligence chatbot vs pathology faculty and residents: Real-world clinical questions from a genitourinary treatment planning conference.","authors":"Matthew X Luo, Adam Lyle, Phillip Bennett, Daniel Albertson, Deepika Sirohi, Benjamin L Maughan, Valarie McMurtry, Jonathon Mahlow","doi":"10.1093/ajcp/aqae078","DOIUrl":null,"url":null,"abstract":"Objectives: Artificial intelligence (AI)-based chatbots have demonstrated accuracy in a variety of fields, including medicine, but research has yet to substantiate their accuracy and clinical relevance. We evaluated an AI chatbot's answers to questions posed during a treatment planning conference.Methods: Pathology residents, pathology faculty, and an AI chatbot (OpenAI ChatGPT [January 30, 2023, release]) answered a questionnaire curated from a genitourinary subspecialty treatment planning conference. Results were evaluated by 2 blinded adjudicators: a clinician expert and a pathology expert. Scores were based on accuracy and clinical relevance.Results: Overall, faculty scored highest (4.75), followed by the AI chatbot (4.10), research-prepared residents (3.50), and unprepared residents (2.87). The AI chatbot scored statistically significantly better than unprepared residents (P = .03) but not statistically significantly different from research-prepared residents (P = .33) or faculty (P = .30). Residents did not statistically significantly improve after research (P = .39), and faculty performed statistically significantly better than both resident categories (unprepared, P < .01; research prepared, P = .01).Conclusions: The AI chatbot gave answers to medical questions that were comparable in accuracy and clinical relevance to pathology faculty, suggesting promise for further development. Serious concerns remain, however, that without the ability to provide support with references, AI will face legitimate scrutiny as to how it can be integrated into medical decision-making.","PeriodicalId":7506,"journal":{"name":"American journal of clinical pathology","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of clinical pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ajcp/aqae078","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Artificial intelligence (AI)-based chatbots have demonstrated accuracy in a variety of fields, including medicine, but research has yet to substantiate their accuracy and clinical relevance. We evaluated an AI chatbot's answers to questions posed during a treatment planning conference.

Methods: Pathology residents, pathology faculty, and an AI chatbot (OpenAI ChatGPT [January 30, 2023, release]) answered a questionnaire curated from a genitourinary subspecialty treatment planning conference. Results were evaluated by 2 blinded adjudicators: a clinician expert and a pathology expert. Scores were based on accuracy and clinical relevance.

Results: Overall, faculty scored highest (4.75), followed by the AI chatbot (4.10), research-prepared residents (3.50), and unprepared residents (2.87). The AI chatbot scored statistically significantly better than unprepared residents (P = .03) but not statistically significantly different from research-prepared residents (P = .33) or faculty (P = .30). Residents did not statistically significantly improve after research (P = .39), and faculty performed statistically significantly better than both resident categories (unprepared, P < .01; research prepared, P = .01).

Conclusions: The AI chatbot gave answers to medical questions that were comparable in accuracy and clinical relevance to pathology faculty, suggesting promise for further development. Serious concerns remain, however, that without the ability to provide support with references, AI will face legitimate scrutiny as to how it can be integrated into medical decision-making.

查看原文本刊更多论文

人工智能聊天机器人与病理学教师和住院医师：来自泌尿生殖系统治疗计划会议的真实世界临床问题。

目的：基于人工智能（AI）的聊天机器人已在包括医学在内的多个领域证明了其准确性，但研究尚未证实其准确性和临床相关性。我们评估了人工智能聊天机器人对治疗计划会议上所提问题的回答：方法：病理学住院医师、病理学教师和一个人工智能聊天机器人（OpenAI ChatGPT [2023 年 1 月 30 日发布]）回答了一份从泌尿生殖亚专科治疗计划会议上策划的问卷。结果由两名盲人评审员（一名临床专家和一名病理学专家）进行评估。评分基于准确性和临床相关性：总体而言，教师得分最高（4.75），其次是人工智能聊天机器人（4.10）、有研究准备的住院医师（3.50）和无研究准备的住院医师（2.87）。人工智能聊天机器人的得分在统计学上明显优于未准备住院医师（P = .03），但在统计学上与研究准备住院医师（P = .33）或教师（P = .30）没有明显差异。研究后的住院医师在统计学上没有明显改善（P = .39），而教职员工的表现在统计学上明显优于两类住院医师（未准备，P 结论）：人工智能聊天机器人对医学问题的回答在准确性和临床相关性方面与病理学教师不相上下，表明其有望得到进一步发展。然而，人们仍然严重关切的是，如果不能提供参考支持，人工智能将面临如何将其融入医疗决策的合理审查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American journal of clinical pathology 医学-病理学

CiteScore

7.70

自引率

2.90%

发文量

367

审稿时长

3-6 weeks

期刊介绍： The American Journal of Clinical Pathology (AJCP) is the official journal of the American Society for Clinical Pathology and the Academy of Clinical Laboratory Physicians and Scientists. It is a leading international journal for publication of articles concerning novel anatomic pathology and laboratory medicine observations on human disease. AJCP emphasizes articles that focus on the application of evolving technologies for the diagnosis and characterization of diseases and conditions, as well as those that have a direct link toward improving patient care.