Navigating ChatGPT's alignment with expert consensus on pediatric OSA management

IF 1.2 4区医学 Q3 OTORHINOLARYNGOLOGY

International journal of pediatric otorhinolaryngology Pub Date : 2024-10-15 DOI:10.1016/j.ijporl.2024.112131

Eileen C. Howard , Jonathan M. Carnino , Nicholas Y.K. Chong , Jessica R. Levi

{"title":"Navigating ChatGPT's alignment with expert consensus on pediatric OSA management","authors":"Eileen C. Howard , Jonathan M. Carnino , Nicholas Y.K. Chong , Jessica R. Levi","doi":"10.1016/j.ijporl.2024.112131","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aimed to evaluate the potential integration of artificial intelligence (AI), specifically ChatGPT, into healthcare decision-making, focusing on its alignment with expert consensus statements regarding the management of persistent pediatric obstructive sleep apnea.</div></div><div><h3>Methods</h3><div>We analyzed ChatGPT's responses to 52 statements from the 2024 expert consensus statement (ECS) on the management of pediatric persistent OSA after adenotonsillectomy. Each statement was input into ChatGPT using a 9-point Likert scale format, with each statement entered three times to calculate mean scores and standard deviations. Statistical analysis was performed using Excel.</div></div><div><h3>Results</h3><div>ChatGPT's responses were within 1.0 of the consensus statement mean score for 63 % (33/52) of the statements. 13 % (7/52) were statements in which the ChatGPT mean response was different from the ECS mean by 2.0 or greater, the majority of which were in the categories of surgical and medical management. Statements with ChatGPT mean scores differing by more than 2.0 from the consensus mean highlighted the risk of disseminating incorrect information on established medical topics, with a notable variation in responses suggesting inconsistencies in ChatGPT's reliability.</div></div><div><h3>Conclusion</h3><div>While ChatGPT demonstrated a promising ability to align with expert medical opinions in many cases, its inconsistencies and potential to propagate inaccuracies in contested areas raise important considerations for its application in clinical settings. The findings underscore the need for ongoing evaluation and refinement of AI tools in healthcare, emphasizing collaboration between AI developers, healthcare professionals, and regulatory bodies to ensure AI's safe and effective integration into medical decision-making processes.</div></div>","PeriodicalId":14388,"journal":{"name":"International journal of pediatric otorhinolaryngology","volume":"186 ","pages":"Article 112131"},"PeriodicalIF":1.2000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of pediatric otorhinolaryngology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165587624002854","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

This study aimed to evaluate the potential integration of artificial intelligence (AI), specifically ChatGPT, into healthcare decision-making, focusing on its alignment with expert consensus statements regarding the management of persistent pediatric obstructive sleep apnea.

Methods

We analyzed ChatGPT's responses to 52 statements from the 2024 expert consensus statement (ECS) on the management of pediatric persistent OSA after adenotonsillectomy. Each statement was input into ChatGPT using a 9-point Likert scale format, with each statement entered three times to calculate mean scores and standard deviations. Statistical analysis was performed using Excel.

Results

ChatGPT's responses were within 1.0 of the consensus statement mean score for 63 % (33/52) of the statements. 13 % (7/52) were statements in which the ChatGPT mean response was different from the ECS mean by 2.0 or greater, the majority of which were in the categories of surgical and medical management. Statements with ChatGPT mean scores differing by more than 2.0 from the consensus mean highlighted the risk of disseminating incorrect information on established medical topics, with a notable variation in responses suggesting inconsistencies in ChatGPT's reliability.

Conclusion

While ChatGPT demonstrated a promising ability to align with expert medical opinions in many cases, its inconsistencies and potential to propagate inaccuracies in contested areas raise important considerations for its application in clinical settings. The findings underscore the need for ongoing evaluation and refinement of AI tools in healthcare, emphasizing collaboration between AI developers, healthcare professionals, and regulatory bodies to ensure AI's safe and effective integration into medical decision-making processes.

查看原文本刊更多论文

引导 ChatGPT 与儿科 OSA 管理专家共识保持一致

方法我们分析了 ChatGPT 对 2024 年专家共识声明（ECS）中关于腺扁桃体切除术后小儿顽固性 OSA 管理的 52 条声明的回复。每项声明均采用 9 点李克特量表格式输入 ChatGPT，每项声明输入三次以计算平均分和标准差。结果有 63% 的陈述（33/52）的 ChatGPT 回答与共识陈述的平均分相差在 1.0 以内。13%（7/52）的陈述中，ChatGPT 的平均回复与 ECS 的平均值相差 2.0 或更多，其中大部分属于手术和药物管理类别。ChatGPT 平均分与共识平均分相差 2.0 分以上的语句凸显了在既定医学主题上传播不正确信息的风险，回答的显著差异表明 ChatGPT 的可靠性存在不一致。这些发现强调了对医疗保健领域的人工智能工具进行持续评估和改进的必要性，强调了人工智能开发人员、医疗保健专业人员和监管机构之间的合作，以确保人工智能安全有效地融入医疗决策过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of pediatric otorhinolaryngology 医学-耳鼻喉科学

CiteScore

3.20

自引率

6.70%

发文量

276

审稿时长

62 days

期刊介绍： The purpose of the International Journal of Pediatric Otorhinolaryngology is to concentrate and disseminate information concerning prevention, cure and care of otorhinolaryngological disorders in infants and children due to developmental, degenerative, infectious, neoplastic, traumatic, social, psychiatric and economic causes. The Journal provides a medium for clinical and basic contributions in all of the areas of pediatric otorhinolaryngology. This includes medical and surgical otology, bronchoesophagology, laryngology, rhinology, diseases of the head and neck, and disorders of communication, including voice, speech and language disorders.