我们来聊聊人工智能聊天机器人如何回答临床传染病药物治疗问题?

IF 3.8 4区 医学 Q2 IMMUNOLOGY
Open Forum Infectious Diseases Pub Date : 2024-10-25 eCollection Date: 2024-11-01 DOI:10.1093/ofid/ofae641
Wesley D Kufel, Kathleen D Hanrahan, Robert W Seabury, Katie A Parsels, Jason C Gallagher, Conan MacDougall, Elizabeth W Covington, Elias B Chahine, Rachel S Britt, Jeffrey M Steele
{"title":"我们来聊聊人工智能聊天机器人如何回答临床传染病药物治疗问题?","authors":"Wesley D Kufel, Kathleen D Hanrahan, Robert W Seabury, Katie A Parsels, Jason C Gallagher, Conan MacDougall, Elizabeth W Covington, Elias B Chahine, Rachel S Britt, Jeffrey M Steele","doi":"10.1093/ofid/ofae641","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>It is unknown whether ChatGPT provides quality responses to infectious diseases (ID) pharmacotherapy questions. This study surveyed ID pharmacist subject matter experts (SMEs) to assess the quality of ChatGPT version 3.5 (GPT-3.5) responses.</p><p><strong>Methods: </strong>The primary outcome was the percentage of GPT-3.5 responses considered useful by SME rating. Secondary outcomes were SMEs' ratings of correctness, completeness, and safety. Rating definitions were based on literature review. One hundred ID pharmacotherapy questions were entered into GPT-3.5 without custom instructions or additional prompts, and responses were recorded. A 0-10 rating scale for correctness, completeness, and safety was developed and validated for interrater reliability. Continuous and categorical variables were assessed for interrater reliability via average measures intraclass correlation coefficient and Fleiss multirater kappa, respectively. SMEs' responses were compared by the Kruskal-Wallis test and chi-square test for continuous and categorical variables.</p><p><strong>Results: </strong>SMEs considered 41.8% of responses useful. Median (IQR) ratings for correctness, completeness, and safety were 7 (4-9), 5 (3-8), and 8 (4-10), respectively. The Fleiss multirater kappa for usefulness was 0.379 (95% CI, .317-.441) indicating fair agreement, and intraclass correlation coefficients were 0.820 (95% CI, .758-.870), 0.745 (95% CI, .656-.816), and 0.833 (95% CI, .775-.880) for correctness, completeness, and safety, indicating at least substantial agreement. No significant difference was observed among SME responses for percentage of responses considered useful.</p><p><strong>Conclusions: </strong>Fewer than 50% of GPT-3.5 responses were considered useful by SMEs. Responses were mostly considered correct and safe but were often incomplete, suggesting that GPT-3.5 responses may not replace an ID pharmacist's responses.</p>","PeriodicalId":19517,"journal":{"name":"Open Forum Infectious Diseases","volume":"11 11","pages":"ofae641"},"PeriodicalIF":3.8000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551448/pdf/","citationCount":"0","resultStr":"{\"title\":\"Let's Have a Chat: How Well Does an Artificial Intelligence Chatbot Answer Clinical Infectious Diseases Pharmacotherapy Questions?\",\"authors\":\"Wesley D Kufel, Kathleen D Hanrahan, Robert W Seabury, Katie A Parsels, Jason C Gallagher, Conan MacDougall, Elizabeth W Covington, Elias B Chahine, Rachel S Britt, Jeffrey M Steele\",\"doi\":\"10.1093/ofid/ofae641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>It is unknown whether ChatGPT provides quality responses to infectious diseases (ID) pharmacotherapy questions. This study surveyed ID pharmacist subject matter experts (SMEs) to assess the quality of ChatGPT version 3.5 (GPT-3.5) responses.</p><p><strong>Methods: </strong>The primary outcome was the percentage of GPT-3.5 responses considered useful by SME rating. Secondary outcomes were SMEs' ratings of correctness, completeness, and safety. Rating definitions were based on literature review. One hundred ID pharmacotherapy questions were entered into GPT-3.5 without custom instructions or additional prompts, and responses were recorded. A 0-10 rating scale for correctness, completeness, and safety was developed and validated for interrater reliability. Continuous and categorical variables were assessed for interrater reliability via average measures intraclass correlation coefficient and Fleiss multirater kappa, respectively. SMEs' responses were compared by the Kruskal-Wallis test and chi-square test for continuous and categorical variables.</p><p><strong>Results: </strong>SMEs considered 41.8% of responses useful. Median (IQR) ratings for correctness, completeness, and safety were 7 (4-9), 5 (3-8), and 8 (4-10), respectively. The Fleiss multirater kappa for usefulness was 0.379 (95% CI, .317-.441) indicating fair agreement, and intraclass correlation coefficients were 0.820 (95% CI, .758-.870), 0.745 (95% CI, .656-.816), and 0.833 (95% CI, .775-.880) for correctness, completeness, and safety, indicating at least substantial agreement. No significant difference was observed among SME responses for percentage of responses considered useful.</p><p><strong>Conclusions: </strong>Fewer than 50% of GPT-3.5 responses were considered useful by SMEs. Responses were mostly considered correct and safe but were often incomplete, suggesting that GPT-3.5 responses may not replace an ID pharmacist's responses.</p>\",\"PeriodicalId\":19517,\"journal\":{\"name\":\"Open Forum Infectious Diseases\",\"volume\":\"11 11\",\"pages\":\"ofae641\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551448/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Open Forum Infectious Diseases\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/ofid/ofae641\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"IMMUNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Forum Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ofid/ofae641","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:目前尚不清楚 ChatGPT 是否能高质量地回答传染病(ID)药物治疗问题。本研究对 ID 药剂师主题专家(SMEs)进行了调查,以评估 ChatGPT 3.5 版(GPT-3.5)回复的质量:主要结果是 SME 评级认为有用的 GPT-3.5 回答的百分比。次要结果是中小企业对正确性、完整性和安全性的评分。评分定义基于文献综述。在 GPT-3.5 中输入了 100 个 ID 药物治疗问题,没有自定义说明或额外提示,并记录了回复。针对正确性、完整性和安全性制定了 0-10 级评分表,并对评分者之间的可靠性进行了验证。连续变量和分类变量分别通过平均测量类内相关系数和弗莱斯多变量卡帕评估研究者之间的可靠性。通过 Kruskal-Wallis 检验和卡方检验对中小型企业的连续变量和分类变量进行比较:中小型企业认为 41.8%的答复有用。正确性、完整性和安全性的评分中位数(IQR)分别为 7(4-9)、5(3-8)和 8(4-10)。有用性的弗莱克斯多方卡帕值为 0.379(95% CI,.317-.441),表明一致性尚可;正确性、完整性和安全性的类内相关系数分别为 0.820(95% CI,.758-.870)、0.745(95% CI,.656-.816)和 0.833(95% CI,.775-.880),表明至少基本一致。在被认为有用的回答百分比方面,中小企业的回答没有明显差异:结论:中小型企业认为有用的 GPT-3.5 答复不到 50%。大多数答复被认为是正确和安全的,但往往不完整,这表明 GPT-3.5 答复可能无法取代 ID 药剂师的答复。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Let's Have a Chat: How Well Does an Artificial Intelligence Chatbot Answer Clinical Infectious Diseases Pharmacotherapy Questions?

Background: It is unknown whether ChatGPT provides quality responses to infectious diseases (ID) pharmacotherapy questions. This study surveyed ID pharmacist subject matter experts (SMEs) to assess the quality of ChatGPT version 3.5 (GPT-3.5) responses.

Methods: The primary outcome was the percentage of GPT-3.5 responses considered useful by SME rating. Secondary outcomes were SMEs' ratings of correctness, completeness, and safety. Rating definitions were based on literature review. One hundred ID pharmacotherapy questions were entered into GPT-3.5 without custom instructions or additional prompts, and responses were recorded. A 0-10 rating scale for correctness, completeness, and safety was developed and validated for interrater reliability. Continuous and categorical variables were assessed for interrater reliability via average measures intraclass correlation coefficient and Fleiss multirater kappa, respectively. SMEs' responses were compared by the Kruskal-Wallis test and chi-square test for continuous and categorical variables.

Results: SMEs considered 41.8% of responses useful. Median (IQR) ratings for correctness, completeness, and safety were 7 (4-9), 5 (3-8), and 8 (4-10), respectively. The Fleiss multirater kappa for usefulness was 0.379 (95% CI, .317-.441) indicating fair agreement, and intraclass correlation coefficients were 0.820 (95% CI, .758-.870), 0.745 (95% CI, .656-.816), and 0.833 (95% CI, .775-.880) for correctness, completeness, and safety, indicating at least substantial agreement. No significant difference was observed among SME responses for percentage of responses considered useful.

Conclusions: Fewer than 50% of GPT-3.5 responses were considered useful by SMEs. Responses were mostly considered correct and safe but were often incomplete, suggesting that GPT-3.5 responses may not replace an ID pharmacist's responses.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Open Forum Infectious Diseases
Open Forum Infectious Diseases Medicine-Neurology (clinical)
CiteScore
6.70
自引率
4.80%
发文量
630
审稿时长
9 weeks
期刊介绍: Open Forum Infectious Diseases provides a global forum for the publication of clinical, translational, and basic research findings in a fully open access, online journal environment. The journal reflects the broad diversity of the field of infectious diseases, and focuses on the intersection of biomedical science and clinical practice, with a particular emphasis on knowledge that holds the potential to improve patient care in populations around the world. Fully peer-reviewed, OFID supports the international community of infectious diseases experts by providing a venue for articles that further the understanding of all aspects of infectious diseases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信