Chie Tanaka, Takahiro Kinoshita, Yohei Okada, Kasumi Satoh, Yosuke Homma, Kensuke Suzuki, Shoji Yokobori, Jun Oda, Yasuhiro Otomo, Takashi Tagami, Special Committee on the Utilization of Advanced Technology in Emergency Medicine, Japanese Association for Acute Medicine
{"title":"Medical validity and layperson interpretation of emergency visit recommendations by the GPT model: A cross-sectional study","authors":"Chie Tanaka, Takahiro Kinoshita, Yohei Okada, Kasumi Satoh, Yosuke Homma, Kensuke Suzuki, Shoji Yokobori, Jun Oda, Yasuhiro Otomo, Takashi Tagami, Special Committee on the Utilization of Advanced Technology in Emergency Medicine, Japanese Association for Acute Medicine","doi":"10.1002/ams2.70042","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>In Japan, emergency ambulance dispatches involve minor cases requiring outpatient services, emphasizing the need for improved public guidance regarding emergency care. This study evaluated both the medical plausibility of the GPT model in aiding laypersons to determine the need for emergency medical care and the laypersons' interpretations of its outputs.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>This cross-sectional study was conducted from December 10, 2023, to March 7, 2024. We input clinical scenarios into the GPT model and evaluated the need for emergency visits based on the outputs. A total of 314 scenarios were labeled with red tags (emergency, immediate emergency department [ED] visit) and 152 with green tags (less urgent). Seven medical specialists assessed the outputs' validity, and 157 laypersons interpreted them via a web-based questionnaire.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Experts reported that the GPT model accurately identified important information in 95.9% (301/314) of red-tagged scenarios and recommended immediate ED visits in 96.5% (303/314). However, only 43.0% (135/314) of laypersons interpreted those outputs as indicating urgent hospital visits. The model identified important information in 99.3% (151/152) of green-tagged scenarios and advised against immediate visits in 88.8% (135/152). However, only 32.2% (49/152) of laypersons considered them routine follow-ups.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Expert evaluations revealed that the GPT model could be highly accurate in advising on emergency visits. However, laypersons frequently misinterpreted its recommendations, highlighting a substantial gap in understanding AI-generated medical advice.</p>\n </section>\n </div>","PeriodicalId":7196,"journal":{"name":"Acute Medicine & Surgery","volume":"12 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ams2.70042","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acute Medicine & Surgery","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ams2.70042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Aim
In Japan, emergency ambulance dispatches involve minor cases requiring outpatient services, emphasizing the need for improved public guidance regarding emergency care. This study evaluated both the medical plausibility of the GPT model in aiding laypersons to determine the need for emergency medical care and the laypersons' interpretations of its outputs.
Methods
This cross-sectional study was conducted from December 10, 2023, to March 7, 2024. We input clinical scenarios into the GPT model and evaluated the need for emergency visits based on the outputs. A total of 314 scenarios were labeled with red tags (emergency, immediate emergency department [ED] visit) and 152 with green tags (less urgent). Seven medical specialists assessed the outputs' validity, and 157 laypersons interpreted them via a web-based questionnaire.
Results
Experts reported that the GPT model accurately identified important information in 95.9% (301/314) of red-tagged scenarios and recommended immediate ED visits in 96.5% (303/314). However, only 43.0% (135/314) of laypersons interpreted those outputs as indicating urgent hospital visits. The model identified important information in 99.3% (151/152) of green-tagged scenarios and advised against immediate visits in 88.8% (135/152). However, only 32.2% (49/152) of laypersons considered them routine follow-ups.
Conclusions
Expert evaluations revealed that the GPT model could be highly accurate in advising on emergency visits. However, laypersons frequently misinterpreted its recommendations, highlighting a substantial gap in understanding AI-generated medical advice.