ChatGPT与网络研究在职业医学临床研究与决策中的比较:随机对照试验。

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES
Felix A Weuthen, Nelly Otte, Hanif Krabbe, Thomas Kraus, Julia Krabbe
{"title":"ChatGPT与网络研究在职业医学临床研究与决策中的比较:随机对照试验。","authors":"Felix A Weuthen, Nelly Otte, Hanif Krabbe, Thomas Kraus, Julia Krabbe","doi":"10.2196/63857","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.</p><p><strong>Objective: </strong>This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.</p><p><strong>Methods: </strong>In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.</p><p><strong>Results: </strong>Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).</p><p><strong>Conclusions: </strong>ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e63857"},"PeriodicalIF":2.0000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12112251/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.\",\"authors\":\"Felix A Weuthen, Nelly Otte, Hanif Krabbe, Thomas Kraus, Julia Krabbe\",\"doi\":\"10.2196/63857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.</p><p><strong>Objective: </strong>This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.</p><p><strong>Methods: </strong>In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.</p><p><strong>Results: </strong>Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).</p><p><strong>Conclusions: </strong>ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.</p>\",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\"9 \",\"pages\":\"e63857\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12112251/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/63857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/63857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:人工智能正在成为日常生活和医疗领域的一部分。GPT-4和ChatGPT等生成式人工智能模型因其性能和可靠性的提高而受到广泛欢迎。然而,这些模型在专业领域的应用,如职业医学,在很大程度上仍未被探索。目的:本研究旨在评估生成式大型语言模型(如ChatGPT)在德国作为医学研究甚至职业医学临床决策支持工具的潜在适用性。方法:在这项随机对照研究中,使用为此目的开发的web应用程序调查ChatGPT在医学研究和临床决策中的可用性。资格标准是医生或医学院学生。参与者(N=56)被要求处理3例职业性肺病并回答与病例相关的问题。通过硬币加权法将每组医生的比例分为两组。一组使用类似于ChatGPT的基于最新GPT-4-Turbo模型的集成聊天应用程序进行案例研究,而另一组使用他们常用的研究方法,如谷歌,Amboss或DocCheck。主要结果是基于正确答案的病例表现,而次要结果包括具体问题准确性和病例处理前后自我评估的职业医学专业知识的变化。分组分配传统上不是盲目的,因为聊天窗口表明成员;参与者只知道这项研究是基于网络的研究,而不知道群体的具体情况。结果:ChatGPT组的参与者(n=27)在特定研究中表现更好,例如,潜在有害物质或活动(例如,案例1:ChatGPT组2.5种导致胸膜变化的有害物质,而自己研究组为1.8种;P = . 01;Cohen r=-0.38),并导致有关专业知识的自我评估增加(ChatGPT组从3.9增加到3.4,而自己的研究组从3.5增加到3.4;德国学校成绩在1=非常好到6=不理想之间;P = .047)。然而,临床决定,例如,是否应该提交职业病报告,往往是参与者自己研究的结果(n=29;例1:职业病报告应该存档吗?ChatGPT组有7名参与者,而他们自己的研究小组有14名参与者;P = .007;优势比6.00,95% CI 1.54-23.36)。结论:ChatGPT可以成为有针对性的医学研究的有用工具,即使是关于职业病的职业医学中相当具体的问题。然而,临床决策目前应该只支持而不是由大语言模型做出。未来的系统应该被严格评估,即使最初的结果是有希望的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

Background: Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.

Objective: This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.

Methods: In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.

Results: Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).

Conclusions: ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Formative Research
JMIR Formative Research Medicine-Medicine (miscellaneous)
CiteScore
2.70
自引率
9.10%
发文量
579
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信