Eileen C. Howard , Jonathan M. Carnino , Nicholas Y.K. Chong , Jessica R. Levi
{"title":"Navigating ChatGPT's alignment with expert consensus on pediatric OSA management","authors":"Eileen C. Howard , Jonathan M. Carnino , Nicholas Y.K. Chong , Jessica R. Levi","doi":"10.1016/j.ijporl.2024.112131","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aimed to evaluate the potential integration of artificial intelligence (AI), specifically ChatGPT, into healthcare decision-making, focusing on its alignment with expert consensus statements regarding the management of persistent pediatric obstructive sleep apnea.</div></div><div><h3>Methods</h3><div>We analyzed ChatGPT's responses to 52 statements from the 2024 expert consensus statement (ECS) on the management of pediatric persistent OSA after adenotonsillectomy. Each statement was input into ChatGPT using a 9-point Likert scale format, with each statement entered three times to calculate mean scores and standard deviations. Statistical analysis was performed using Excel.</div></div><div><h3>Results</h3><div>ChatGPT's responses were within 1.0 of the consensus statement mean score for 63 % (33/52) of the statements. 13 % (7/52) were statements in which the ChatGPT mean response was different from the ECS mean by 2.0 or greater, the majority of which were in the categories of surgical and medical management. Statements with ChatGPT mean scores differing by more than 2.0 from the consensus mean highlighted the risk of disseminating incorrect information on established medical topics, with a notable variation in responses suggesting inconsistencies in ChatGPT's reliability.</div></div><div><h3>Conclusion</h3><div>While ChatGPT demonstrated a promising ability to align with expert medical opinions in many cases, its inconsistencies and potential to propagate inaccuracies in contested areas raise important considerations for its application in clinical settings. The findings underscore the need for ongoing evaluation and refinement of AI tools in healthcare, emphasizing collaboration between AI developers, healthcare professionals, and regulatory bodies to ensure AI's safe and effective integration into medical decision-making processes.</div></div>","PeriodicalId":14388,"journal":{"name":"International journal of pediatric otorhinolaryngology","volume":"186 ","pages":"Article 112131"},"PeriodicalIF":1.2000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of pediatric otorhinolaryngology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165587624002854","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
This study aimed to evaluate the potential integration of artificial intelligence (AI), specifically ChatGPT, into healthcare decision-making, focusing on its alignment with expert consensus statements regarding the management of persistent pediatric obstructive sleep apnea.
Methods
We analyzed ChatGPT's responses to 52 statements from the 2024 expert consensus statement (ECS) on the management of pediatric persistent OSA after adenotonsillectomy. Each statement was input into ChatGPT using a 9-point Likert scale format, with each statement entered three times to calculate mean scores and standard deviations. Statistical analysis was performed using Excel.
Results
ChatGPT's responses were within 1.0 of the consensus statement mean score for 63 % (33/52) of the statements. 13 % (7/52) were statements in which the ChatGPT mean response was different from the ECS mean by 2.0 or greater, the majority of which were in the categories of surgical and medical management. Statements with ChatGPT mean scores differing by more than 2.0 from the consensus mean highlighted the risk of disseminating incorrect information on established medical topics, with a notable variation in responses suggesting inconsistencies in ChatGPT's reliability.
Conclusion
While ChatGPT demonstrated a promising ability to align with expert medical opinions in many cases, its inconsistencies and potential to propagate inaccuracies in contested areas raise important considerations for its application in clinical settings. The findings underscore the need for ongoing evaluation and refinement of AI tools in healthcare, emphasizing collaboration between AI developers, healthcare professionals, and regulatory bodies to ensure AI's safe and effective integration into medical decision-making processes.
期刊介绍:
The purpose of the International Journal of Pediatric Otorhinolaryngology is to concentrate and disseminate information concerning prevention, cure and care of otorhinolaryngological disorders in infants and children due to developmental, degenerative, infectious, neoplastic, traumatic, social, psychiatric and economic causes. The Journal provides a medium for clinical and basic contributions in all of the areas of pediatric otorhinolaryngology. This includes medical and surgical otology, bronchoesophagology, laryngology, rhinology, diseases of the head and neck, and disorders of communication, including voice, speech and language disorders.