Felix A Weuthen, Nelly Otte, Hanif Krabbe, Thomas Kraus, Julia Krabbe
{"title":"ChatGPT与网络研究在职业医学临床研究与决策中的比较:随机对照试验。","authors":"Felix A Weuthen, Nelly Otte, Hanif Krabbe, Thomas Kraus, Julia Krabbe","doi":"10.2196/63857","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.</p><p><strong>Objective: </strong>This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.</p><p><strong>Methods: </strong>In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.</p><p><strong>Results: </strong>Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).</p><p><strong>Conclusions: </strong>ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e63857"},"PeriodicalIF":2.0000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12112251/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.\",\"authors\":\"Felix A Weuthen, Nelly Otte, Hanif Krabbe, Thomas Kraus, Julia Krabbe\",\"doi\":\"10.2196/63857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.</p><p><strong>Objective: </strong>This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.</p><p><strong>Methods: </strong>In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.</p><p><strong>Results: </strong>Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).</p><p><strong>Conclusions: </strong>ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.</p>\",\"PeriodicalId\":14841,\"journal\":{\"name\":\"JMIR Formative Research\",\"volume\":\"9 \",\"pages\":\"e63857\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12112251/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Formative Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/63857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/63857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.
Background: Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.
Objective: This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.
Methods: In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.
Results: Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).
Conclusions: ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.