R Brandon Hunter, Satid Thammasitboon, Sreya S Rahman, Nina Fainberg, Andrew Renuart, Shelley Kumar, Parag N Jain, Brian Rissmiller, Moushumi Sur, Sanjiv Mehta
{"title":"Using ChatGPT to Provide Patient-Specific Answers to Parental Questions in the PICU.","authors":"R Brandon Hunter, Satid Thammasitboon, Sreya S Rahman, Nina Fainberg, Andrew Renuart, Shelley Kumar, Parag N Jain, Brian Rissmiller, Moushumi Sur, Sanjiv Mehta","doi":"10.1542/peds.2024-066615","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To determine if ChatGPT can incorporate patient-specific information to provide high-quality answers to parental questions in the PICU. We hypothesized that ChatGPT would generate high-quality, patient-specific responses.</p><p><strong>Methods: </strong>In this cross-sectional study, we generated assessments and plans for 3 PICU patients with respiratory failure, septic shock, and status epilepticus and paired them with 8 typical parental questions. We prompted ChatGPT with instructions, an assessment and plan, and 1 question. Six PICU physicians evaluated the responses for accuracy (1-6), completeness (yes/no), empathy (1-6), and understandability (Patient Education Materials Assessment Tool, PEMAT, 0% to 100%; Flesch-Kincaid grade level). We compared answer quality among scenarios and question types using the Kruskal-Wallis and Fischer's exact tests. We used percent agreement, Cohen's Kappa, and Gwet's agreement coefficient to estimate inter-rater reliability.</p><p><strong>Results: </strong>All answers incorporated patient details, utilizing them for reasoning in 59% of sentences. Responses had high accuracy (median 5.0, [interquartile range (IQR), 4.0-6.0]), empathy (median 5.0, [IQR, 5.0-6.0]), completeness (97% of all questions), and understandability (PEMAT % median 100, [IQR, 87.5-100]; Flesch-Kincaid level 8.7). Only 4/144 reviewer scores were <4/6 in accuracy, and no response was deemed likely to cause harm. There was no difference in accuracy, completeness, empathy, or understandability among scenarios or question types. We found fair, substantial, and almost perfect agreement among reviewers for accuracy, empathy, and understandability, respectively.</p><p><strong>Conclusions: </strong>ChatGPT used patient-specific information to provide high-quality answers to parental questions in PICU clinical scenarios.</p>","PeriodicalId":20028,"journal":{"name":"Pediatrics","volume":" ","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1542/peds.2024-066615","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To determine if ChatGPT can incorporate patient-specific information to provide high-quality answers to parental questions in the PICU. We hypothesized that ChatGPT would generate high-quality, patient-specific responses.
Methods: In this cross-sectional study, we generated assessments and plans for 3 PICU patients with respiratory failure, septic shock, and status epilepticus and paired them with 8 typical parental questions. We prompted ChatGPT with instructions, an assessment and plan, and 1 question. Six PICU physicians evaluated the responses for accuracy (1-6), completeness (yes/no), empathy (1-6), and understandability (Patient Education Materials Assessment Tool, PEMAT, 0% to 100%; Flesch-Kincaid grade level). We compared answer quality among scenarios and question types using the Kruskal-Wallis and Fischer's exact tests. We used percent agreement, Cohen's Kappa, and Gwet's agreement coefficient to estimate inter-rater reliability.
Results: All answers incorporated patient details, utilizing them for reasoning in 59% of sentences. Responses had high accuracy (median 5.0, [interquartile range (IQR), 4.0-6.0]), empathy (median 5.0, [IQR, 5.0-6.0]), completeness (97% of all questions), and understandability (PEMAT % median 100, [IQR, 87.5-100]; Flesch-Kincaid level 8.7). Only 4/144 reviewer scores were <4/6 in accuracy, and no response was deemed likely to cause harm. There was no difference in accuracy, completeness, empathy, or understandability among scenarios or question types. We found fair, substantial, and almost perfect agreement among reviewers for accuracy, empathy, and understandability, respectively.
Conclusions: ChatGPT used patient-specific information to provide high-quality answers to parental questions in PICU clinical scenarios.
期刊介绍:
The Pediatrics® journal is the official flagship journal of the American Academy of Pediatrics (AAP). It is widely cited in the field of pediatric medicine and is recognized as the leading journal in the field.
The journal publishes original research and evidence-based articles, which provide authoritative information to help readers stay up-to-date with the latest developments in pediatric medicine. The content is peer-reviewed and undergoes rigorous evaluation to ensure its quality and reliability.
Pediatrics also serves as a valuable resource for conducting new research studies and supporting education and training activities in the field of pediatrics. It aims to enhance the quality of pediatric outpatient and inpatient care by disseminating valuable knowledge and insights.
As of 2023, Pediatrics has an impressive Journal Impact Factor (IF) Score of 8.0. The IF is a measure of a journal's influence and importance in the scientific community, with higher scores indicating a greater impact. This score reflects the significance and reach of the research published in Pediatrics, further establishing its prominence in the field of pediatric medicine.