与IDSA建议相比，评估ChatGPT和Bard的软组织感染专业知识。

IF 5.4 2区医学 Q3 ENGINEERING, BIOMEDICAL

Annals of Biomedical Engineering Pub Date : 2023-10-21 DOI:10.1007/s10439-023-03372-1

Mario Alessandri-Bonetti, Riccardo Giorgino, Michelle Naegeli, Hilary Y. Liu, Francesco M. Egro

{"title":"与IDSA建议相比，评估ChatGPT和Bard的软组织感染专业知识。","authors":"Mario Alessandri-Bonetti, Riccardo Giorgino, Michelle Naegeli, Hilary Y. Liu, Francesco M. Egro","doi":"10.1007/s10439-023-03372-1","DOIUrl":null,"url":null,"abstract":"<div><p>The aim of the study was to evaluate whether ChatGPT-3.5 and Bard provide safe and reliable medical answers to common topics related to soft tissue infections and their management according to the guidelines provided by the Infectious Disease Society of America (IDSA). IDSA’s abridged recommendations for soft tissue infections were identified on the IDSA official website. Twenty-five queries were entered into the LLMs as they appear on the IDSA website. To assess the concordance and precision of the LLMs’ responses with the IDSA guidelines, two infectious disease physicians independently compared and evaluated each response. This was done using a 5-point Likert scale, with 1 representing poor concordance and 5 excellent concordance, as adapted from the validated Global Quality Scale. The mean ± SD score for ChatGPT-generated responses was 4.34 ± 0.74, <i>n</i> = 25. This indicates that raters found the answers were good to excellent quality with the most important topics covered. Although some topics were not covered, the answers were in good concordance with the IDSA guidelines. The mean ± SD score for Bard-generate responses was 3.5 ± 1.2, <i>n</i> = 25, indicating moderate quality. Despite LLMs did not appear to provide wrong recommendations and covered most of the topics, the responses were often found to be generic, rambling, missing some details, and lacking actionability. As AI continues to evolve and researchers feed it with more extensive and diverse medical knowledge, it may be inching closer to becoming a reliable aid for clinicians, ultimately enhancing the accuracy of infectious disease diagnosis and management in the future.</p></div>","PeriodicalId":7986,"journal":{"name":"Annals of Biomedical Engineering","volume":"52 6","pages":"1551 - 1553"},"PeriodicalIF":5.4000,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Soft Tissue Infection Expertise of ChatGPT and Bard Compared to IDSA Recommendations\",\"authors\":\"Mario Alessandri-Bonetti, Riccardo Giorgino, Michelle Naegeli, Hilary Y. Liu, Francesco M. Egro\",\"doi\":\"10.1007/s10439-023-03372-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The aim of the study was to evaluate whether ChatGPT-3.5 and Bard provide safe and reliable medical answers to common topics related to soft tissue infections and their management according to the guidelines provided by the Infectious Disease Society of America (IDSA). IDSA’s abridged recommendations for soft tissue infections were identified on the IDSA official website. Twenty-five queries were entered into the LLMs as they appear on the IDSA website. To assess the concordance and precision of the LLMs’ responses with the IDSA guidelines, two infectious disease physicians independently compared and evaluated each response. This was done using a 5-point Likert scale, with 1 representing poor concordance and 5 excellent concordance, as adapted from the validated Global Quality Scale. The mean ± SD score for ChatGPT-generated responses was 4.34 ± 0.74, <i>n</i> = 25. This indicates that raters found the answers were good to excellent quality with the most important topics covered. Although some topics were not covered, the answers were in good concordance with the IDSA guidelines. The mean ± SD score for Bard-generate responses was 3.5 ± 1.2, <i>n</i> = 25, indicating moderate quality. Despite LLMs did not appear to provide wrong recommendations and covered most of the topics, the responses were often found to be generic, rambling, missing some details, and lacking actionability. As AI continues to evolve and researchers feed it with more extensive and diverse medical knowledge, it may be inching closer to becoming a reliable aid for clinicians, ultimately enhancing the accuracy of infectious disease diagnosis and management in the future.</p></div>\",\"PeriodicalId\":7986,\"journal\":{\"name\":\"Annals of Biomedical Engineering\",\"volume\":\"52 6\",\"pages\":\"1551 - 1553\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2023-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Biomedical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10439-023-03372-1\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s10439-023-03372-1","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

该研究的目的是评估ChatGPT-3.5和Bard是否根据美国传染病学会（IDSA）提供的指南，为与软组织感染及其管理相关的常见主题提供了安全可靠的医学答案。IDSA官方网站上公布了IDSA对软组织感染的简要建议。当LLM出现在IDSA网站上时，有25个查询被输入到LLM中。为了评估LLM的反应与IDSA指南的一致性和准确性，两名传染病医生独立比较和评估了每种反应。这是使用5分Likert量表进行的，其中1分表示不良一致性，5分表示良好一致性，这是根据经验证的全球质量量表改编的。平均值 ± ChatGPT生成的回复的SD分数为4.34 ± 0.74，n = 25.这表明评分者发现答案质量良好，涵盖了最重要的主题。虽然有些主题没有涉及，但答案与IDSA指南非常一致。平均值 ± 巴德生成反应的SD得分为3.5 ± 1.2，n = 25，表明质量适中。尽管LLM似乎没有提供错误的建议，并且涵盖了大部分主题，但人们往往发现回复笼统、杂乱无章、遗漏了一些细节，缺乏可操作性。随着人工智能的不断发展，研究人员为其提供了更广泛、更多样的医学知识，它可能越来越接近成为临床医生的可靠助手，最终提高未来传染病诊断和管理的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing the Soft Tissue Infection Expertise of ChatGPT and Bard Compared to IDSA Recommendations

The aim of the study was to evaluate whether ChatGPT-3.5 and Bard provide safe and reliable medical answers to common topics related to soft tissue infections and their management according to the guidelines provided by the Infectious Disease Society of America (IDSA). IDSA’s abridged recommendations for soft tissue infections were identified on the IDSA official website. Twenty-five queries were entered into the LLMs as they appear on the IDSA website. To assess the concordance and precision of the LLMs’ responses with the IDSA guidelines, two infectious disease physicians independently compared and evaluated each response. This was done using a 5-point Likert scale, with 1 representing poor concordance and 5 excellent concordance, as adapted from the validated Global Quality Scale. The mean ± SD score for ChatGPT-generated responses was 4.34 ± 0.74, n = 25. This indicates that raters found the answers were good to excellent quality with the most important topics covered. Although some topics were not covered, the answers were in good concordance with the IDSA guidelines. The mean ± SD score for Bard-generate responses was 3.5 ± 1.2, n = 25, indicating moderate quality. Despite LLMs did not appear to provide wrong recommendations and covered most of the topics, the responses were often found to be generic, rambling, missing some details, and lacking actionability. As AI continues to evolve and researchers feed it with more extensive and diverse medical knowledge, it may be inching closer to becoming a reliable aid for clinicians, ultimately enhancing the accuracy of infectious disease diagnosis and management in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Biomedical Engineering 工程技术-工程：生物医学

CiteScore

7.50

自引率

15.80%

发文量

212

审稿时长

3 months

期刊介绍： Annals of Biomedical Engineering is an official journal of the Biomedical Engineering Society, publishing original articles in the major fields of bioengineering and biomedical engineering. The Annals is an interdisciplinary and international journal with the aim to highlight integrated approaches to the solutions of biological and biomedical problems.