小儿家族性地中海热人工智能生成工具的可靠性:多中心专家调查的启示。

IF 2.8 3区 医学 Q1 PEDIATRICS
Saverio La Bella, Marina Attanasi, Annamaria Porreca, Armando Di Ludovico, Maria Cristina Maggio, Romina Gallizzi, Francesco La Torre, Donato Rigante, Francesca Soscia, Francesca Ardenti Morini, Antonella Insalaco, Marco Francesco Natale, Francesco Chiarelli, Gabriele Simonini, Fabrizio De Benedetti, Marco Gattorno, Luciana Breda
{"title":"小儿家族性地中海热人工智能生成工具的可靠性:多中心专家调查的启示。","authors":"Saverio La Bella, Marina Attanasi, Annamaria Porreca, Armando Di Ludovico, Maria Cristina Maggio, Romina Gallizzi, Francesco La Torre, Donato Rigante, Francesca Soscia, Francesca Ardenti Morini, Antonella Insalaco, Marco Francesco Natale, Francesco Chiarelli, Gabriele Simonini, Fabrizio De Benedetti, Marco Gattorno, Luciana Breda","doi":"10.1186/s12969-024-01011-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).</p><p><strong>Methods: </strong>Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.</p><p><strong>Results: </strong>Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.</p><p><strong>Conclusions: </strong>AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.</p>","PeriodicalId":54630,"journal":{"name":"Pediatric Rheumatology","volume":"22 1","pages":"78"},"PeriodicalIF":2.8000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11342667/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.\",\"authors\":\"Saverio La Bella, Marina Attanasi, Annamaria Porreca, Armando Di Ludovico, Maria Cristina Maggio, Romina Gallizzi, Francesco La Torre, Donato Rigante, Francesca Soscia, Francesca Ardenti Morini, Antonella Insalaco, Marco Francesco Natale, Francesco Chiarelli, Gabriele Simonini, Fabrizio De Benedetti, Marco Gattorno, Luciana Breda\",\"doi\":\"10.1186/s12969-024-01011-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).</p><p><strong>Methods: </strong>Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.</p><p><strong>Results: </strong>Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.</p><p><strong>Conclusions: </strong>AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.</p>\",\"PeriodicalId\":54630,\"journal\":{\"name\":\"Pediatric Rheumatology\",\"volume\":\"22 1\",\"pages\":\"78\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11342667/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pediatric Rheumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12969-024-01011-0\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12969-024-01011-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:人工智能(AI)已成为医学领域用于临床和研究的热门工具。本研究旨在评估生成式人工智能工具在小儿家族性地中海热(FMF)方面的准确性和可靠性:方法:将有关小儿地中海家族性发热的 15 个重复三次的问题提示给使用 Chat-GPT 4.0 的流行生成式人工智能工具 Microsoft Copilot。九位儿科风湿病学专家在盲法机制下,使用1至5分的李克特量表对回答的准确性进行评分:初次评估的总体回答中值为 2.00 至 5.00。第二次评估的中值范围为 2.00 至 4.00,第三次评估的中值范围为 3.00 至 4.00。专家之间的一致性从较差到中等(等级内相关系数范围:-0.151 至 0.534)。专家之间的一致性水平随着时间的推移而降低,克里彭多尔夫的α系数值从 0.136(第一次回答)到 0.132(第二次回答)再到 0.089(第三次回答)。最后,专家们在调查前后对人工智能表现出了不同程度的信任:结论:人工智能在儿科风湿病学中具有广阔的前景,包括早期诊断和管理优化,但由于信息可靠性不确定和缺乏专家验证,挑战依然存在。我们的调查显示,人工智能生成的有关 FMF 的回答存在相当大的不准确性和不完整性,评分者内部和外部的可靠性都很差。人工验证对于管理人工智能生成的医疗信息仍然至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.

Background: Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).

Methods: Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.

Results: Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.

Conclusions: AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pediatric Rheumatology
Pediatric Rheumatology PEDIATRICS-RHEUMATOLOGY
CiteScore
4.10
自引率
8.00%
发文量
95
审稿时长
>12 weeks
期刊介绍: Pediatric Rheumatology is an open access, peer-reviewed, online journal encompassing all aspects of clinical and basic research related to pediatric rheumatology and allied subjects. The journal’s scope of diseases and syndromes include musculoskeletal pain syndromes, rheumatic fever and post-streptococcal syndromes, juvenile idiopathic arthritis, systemic lupus erythematosus, juvenile dermatomyositis, local and systemic scleroderma, Kawasaki disease, Henoch-Schonlein purpura and other vasculitides, sarcoidosis, inherited musculoskeletal syndromes, autoinflammatory syndromes, and others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信