{"title":"Artificial Intelligence as a Language Barrier Application in a Simulated Health Care Setting: A Proof-of-Concept Study.","authors":"Nicholas Hampers, Rita Thieme, Louis Hampers","doi":"10.1097/PEC.0000000000003369","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>We evaluated the accuracy of an artificial intelligence program (ChatGPT 4.0) as a medical translation modality in a simulated pediatric urgent care setting.</p><p><strong>Methods: </strong>Two entirely separate instances of ChatGPT 4.0 were used. The first served as a simulated patient (SP). The SP generated complaints and symptoms while processing and generating text only in Spanish. A human provider (blinded to diagnosis) conducted a clinical \"visit\" with the SP. The provider typed questions and instructions in English only. A second instance of ChatGPT 4.0 was the artificial medical interpreter (AMI). The AMI translated the provider's questions/instructions from English to Spanish and the SP's responses/concerns from Spanish to English in real time. Post-visit transcripts were then reviewed for errors by a human-certified medical interpreter.</p><p><strong>Results: </strong>We conducted 10 simulated visits with 3597 words translated by the AMI (1331 English and 2266 Spanish). There were 23 errors (raw accuracy rate of 99.4%). Errors were categorized as: 9 omissions, 2 additions, 11 substitutions, and 1 editorialization. Three errors were judged to have potential clinical consequences, although these were minor ambiguities, readily resolved by the provider during the visit. Also, the AMI made repeated errors of gender (masculine/feminine) and second person formality (\"usted\"/\"tu\"). None of these were judged to have potential clinical consequences.</p><p><strong>Conclusions: </strong>The AMI accurately and safely translated the written content of simulated urgent care visits. It may serve as the basis for an expedient, cost-effective medical interpreter modality. Further work should seek to couple this translation accuracy with speech recognition and generative technology in trials with actual patients.</p>","PeriodicalId":19996,"journal":{"name":"Pediatric emergency care","volume":"41 6","pages":"481-485"},"PeriodicalIF":1.2000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12118609/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric emergency care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/PEC.0000000000003369","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: We evaluated the accuracy of an artificial intelligence program (ChatGPT 4.0) as a medical translation modality in a simulated pediatric urgent care setting.
Methods: Two entirely separate instances of ChatGPT 4.0 were used. The first served as a simulated patient (SP). The SP generated complaints and symptoms while processing and generating text only in Spanish. A human provider (blinded to diagnosis) conducted a clinical "visit" with the SP. The provider typed questions and instructions in English only. A second instance of ChatGPT 4.0 was the artificial medical interpreter (AMI). The AMI translated the provider's questions/instructions from English to Spanish and the SP's responses/concerns from Spanish to English in real time. Post-visit transcripts were then reviewed for errors by a human-certified medical interpreter.
Results: We conducted 10 simulated visits with 3597 words translated by the AMI (1331 English and 2266 Spanish). There were 23 errors (raw accuracy rate of 99.4%). Errors were categorized as: 9 omissions, 2 additions, 11 substitutions, and 1 editorialization. Three errors were judged to have potential clinical consequences, although these were minor ambiguities, readily resolved by the provider during the visit. Also, the AMI made repeated errors of gender (masculine/feminine) and second person formality ("usted"/"tu"). None of these were judged to have potential clinical consequences.
Conclusions: The AMI accurately and safely translated the written content of simulated urgent care visits. It may serve as the basis for an expedient, cost-effective medical interpreter modality. Further work should seek to couple this translation accuracy with speech recognition and generative technology in trials with actual patients.
期刊介绍:
Pediatric Emergency Care®, features clinically relevant original articles with an EM perspective on the care of acutely ill or injured children and adolescents. The journal is aimed at both the pediatrician who wants to know more about treating and being compensated for minor emergency cases and the emergency physicians who must treat children or adolescents in more than one case in there.