[用于麻醉学和急诊医学技术强化学习的 ChatGPT 以及人工智能语言模型的潜在临床应用:人工智能在医疗应用中的炒作与现实之间]。

Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz
{"title":"[用于麻醉学和急诊医学技术强化学习的 ChatGPT 以及人工智能语言模型的潜在临床应用:人工智能在医疗应用中的炒作与现实之间]。","authors":"Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz","doi":"10.1007/s00101-024-01403-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.</p><p><strong>Research question: </strong>This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.</p><p><strong>Method: </strong>The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.</p><p><strong>Results: </strong>The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.</p><p><strong>Discussion: </strong>Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.</p>","PeriodicalId":72805,"journal":{"name":"Die Anaesthesiologie","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076380/pdf/","citationCount":"0","resultStr":"{\"title\":\"[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].\",\"authors\":\"Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz\",\"doi\":\"10.1007/s00101-024-01403-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.</p><p><strong>Research question: </strong>This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.</p><p><strong>Method: </strong>The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.</p><p><strong>Results: </strong>The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.</p><p><strong>Discussion: </strong>Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.</p>\",\"PeriodicalId\":72805,\"journal\":{\"name\":\"Die Anaesthesiologie\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076380/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Die Anaesthesiologie\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00101-024-01403-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Die Anaesthesiologie","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00101-024-01403-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:目前,人工智能语言模型在教育和学术界的应用是一个研究课题,在临床环境中的应用也在测试之中。多个研究小组进行的研究表明,语言模型可以回答与医学考试相关的问题,这些模型在医学教育中也有潜在的应用前景:研究问题:本研究旨在调查当前版本的语言模型在多大程度上能有效解决医学问题,它们在医学教育中的潜在作用,以及人工智能语言模型在运行过程中仍然存在的挑战:方法:基于 GPT 3.5 的 ChatGPT 程序必须回答医学考试第二部分(M2)的 1025 个问题。研究考察了是否出现错误以及错误的类型。此外,还要求语言模型就麻醉学专科培训和急诊医学补充资格标准课程中列出的学习目标撰写论文。之后对这些文章进行了分析,并检查了错误和异常情况:结果表明,ChatGPT 能够正确回答问题,正确率超过 69%,即使问题中包含对直观教具的引用。这表明,与三月份进行的一项研究相比,回答董事会考试问题的准确率有所提高;但是,在生成论文时,出现了较高的错误率:讨论:考虑到目前人工智能语言模型不断改进的速度,在临床上,特别是在急诊科以及急诊和重症监护医学中,在医学实习生的协助下广泛实施人工智能语言模型是一个可行的方案。这些模型可以提供见解,支持医疗专业人员的工作,而无需完全依赖语言模型。虽然这些模型在教育中的应用前景广阔,但目前还需要大量的监督。由于语言模型的训练环境不足而产生的幻觉,生成的文本可能会偏离当前的科学知识水平。目前似乎还无法在没有医生长期监督的情况下在病人护理环境中直接使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].

Background: The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.

Research question: This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.

Method: The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.

Results: The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.

Discussion: Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信