Artificial Intelligence-Based Large Language Models Can Facilitate Patient Education.

Xochitl Bryson, Marleni Albarran, Nicole Pham, Arianne Salunga, Taylor Johnson, Grant D Hogue, Jaysson T Brooks, Kali R Tileston, Craig R Louer, Ron El-Hawary, Meghan N Imrie, James F Policy, Daniel Bouton, Arun R Hariharan, Sara Van Nortwick, Vidyadhar V Upasani, Jennifer M Bauer, Andrew Tice, John S Vorhies
{"title":"Artificial Intelligence-Based Large Language Models Can Facilitate Patient Education.","authors":"Xochitl Bryson, Marleni Albarran, Nicole Pham, Arianne Salunga, Taylor Johnson, Grant D Hogue, Jaysson T Brooks, Kali R Tileston, Craig R Louer, Ron El-Hawary, Meghan N Imrie, James F Policy, Daniel Bouton, Arun R Hariharan, Sara Van Nortwick, Vidyadhar V Upasani, Jennifer M Bauer, Andrew Tice, John S Vorhies","doi":"10.1016/j.jposna.2025.100196","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) large language models (LLMs) are becoming increasingly popular, with patients and families more likely to utilize LLM when conducting internet-based research about scoliosis. For this reason, it is vital to understand the abilities and limitations of this technology in disseminating accurate medical information. We used an expert panel to compare LLM-generated and professional society-authored answers to frequently asked questions about pediatric scoliosis.</p><p><strong>Methods: </strong>We used three publicly available LLMs to generate answers to 15 frequently asked questions (FAQs) regarding pediatric scoliosis. The FAQs were derived from the Scoliosis Research Society, the American Academy of Orthopaedic Surgeons, and the Pediatric Spine Foundation. We gave minimal training to the LLM other than specifying the response length and requesting answers at a 5th-grade reading level. A 15-question survey was distributed to an expert panel composed of pediatric spine surgeons. To determine readability, responses were inputted into an open-source calculator. The panel members were presented with an AI and a physician-generated response to a FAQ and asked to select which they preferred. They were then asked to individually grade the accuracy of responses on a Likert scale.</p><p><strong>Results: </strong>The panel members had a mean of 8.9 years of experience post-fellowship (range: 3-23 years). The panel reported nearly equivalent agreement between AI-generated and physician-generated answers. The expert panel favored professional society-written responses for 40% of questions, AI for 40%, ranked responses equally good for 13%, and saw a tie between AI and \"equally good\" for 7%. For two professional society-generated and three AI-generated responses, the error bars of the expert panel mean score for accuracy and appropriateness fell below neutral, indicating a lack of consensus and mixed opinions with the response.</p><p><strong>Conclusions: </strong>Based on the expert panel review, AI delivered accurate and appropriate answers as frequently as professional society-authored FAQ answers from professional society websites. AI and professional society websites were equally likely to generate answers with which the expert panel disagreed.</p><p><strong>Key concepts: </strong>(1)Large language models (LLMs) are increasingly used for generating medical information online, necessitating an evaluation of their accuracy and effectiveness compared with traditional sources.(2)An expert panel of physicians compared artificial intelligence (AI)-generated answers with professional society-authored answers to pediatric scoliosis frequently asked questions, finding that both types of answers were equally favored in terms of accuracy and appropriateness.(3)The panel reported a similar rate of disagreement with AI-generated and professional society-generated answers, indicating that both had areas of controversy.(4)Over half of the expert panel members felt they could distinguish between AI-generated and professional society-generated answers but this did not relate to their preferences.(5)While AI can support medical information dissemination, further research and improvements are needed to address its limitations and ensure high-quality, accessible patient education.</p><p><strong>Levels of evidence: </strong>IV.</p>","PeriodicalId":520850,"journal":{"name":"Journal of the Pediatric Orthopaedic Society of North America","volume":"12 ","pages":"100196"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337203/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Pediatric Orthopaedic Society of North America","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jposna.2025.100196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Artificial intelligence (AI) large language models (LLMs) are becoming increasingly popular, with patients and families more likely to utilize LLM when conducting internet-based research about scoliosis. For this reason, it is vital to understand the abilities and limitations of this technology in disseminating accurate medical information. We used an expert panel to compare LLM-generated and professional society-authored answers to frequently asked questions about pediatric scoliosis.

Methods: We used three publicly available LLMs to generate answers to 15 frequently asked questions (FAQs) regarding pediatric scoliosis. The FAQs were derived from the Scoliosis Research Society, the American Academy of Orthopaedic Surgeons, and the Pediatric Spine Foundation. We gave minimal training to the LLM other than specifying the response length and requesting answers at a 5th-grade reading level. A 15-question survey was distributed to an expert panel composed of pediatric spine surgeons. To determine readability, responses were inputted into an open-source calculator. The panel members were presented with an AI and a physician-generated response to a FAQ and asked to select which they preferred. They were then asked to individually grade the accuracy of responses on a Likert scale.

Results: The panel members had a mean of 8.9 years of experience post-fellowship (range: 3-23 years). The panel reported nearly equivalent agreement between AI-generated and physician-generated answers. The expert panel favored professional society-written responses for 40% of questions, AI for 40%, ranked responses equally good for 13%, and saw a tie between AI and "equally good" for 7%. For two professional society-generated and three AI-generated responses, the error bars of the expert panel mean score for accuracy and appropriateness fell below neutral, indicating a lack of consensus and mixed opinions with the response.

Conclusions: Based on the expert panel review, AI delivered accurate and appropriate answers as frequently as professional society-authored FAQ answers from professional society websites. AI and professional society websites were equally likely to generate answers with which the expert panel disagreed.

Key concepts: (1)Large language models (LLMs) are increasingly used for generating medical information online, necessitating an evaluation of their accuracy and effectiveness compared with traditional sources.(2)An expert panel of physicians compared artificial intelligence (AI)-generated answers with professional society-authored answers to pediatric scoliosis frequently asked questions, finding that both types of answers were equally favored in terms of accuracy and appropriateness.(3)The panel reported a similar rate of disagreement with AI-generated and professional society-generated answers, indicating that both had areas of controversy.(4)Over half of the expert panel members felt they could distinguish between AI-generated and professional society-generated answers but this did not relate to their preferences.(5)While AI can support medical information dissemination, further research and improvements are needed to address its limitations and ensure high-quality, accessible patient education.

Levels of evidence: IV.

基于人工智能的大型语言模型可以促进患者教育。
背景:人工智能(AI)大语言模型(LLM)正变得越来越流行,患者和家属在进行基于互联网的脊柱侧凸研究时更有可能使用LLM。因此,了解该技术在传播准确医疗信息方面的能力和局限性至关重要。我们使用一个专家小组来比较法学硕士生成的和专业协会撰写的关于儿童脊柱侧凸的常见问题的答案。方法:我们使用三个公开的法学硕士来生成关于儿童脊柱侧凸的15个常见问题的答案。常见问题解答来自脊柱侧凸研究学会、美国矫形外科医师学会和小儿脊柱基金会。除了指定回答长度和要求回答达到五年级阅读水平外,我们对法学硕士进行了最少的培训。一份有15个问题的调查被分发给一个由儿科脊柱外科医生组成的专家小组。为了确定可读性,回答被输入到一个开源计算器中。小组成员收到了一个人工智能和医生对常见问题的回答,并被要求选择他们更喜欢的一个。然后他们被要求在李克特量表上对回答的准确性进行单独评分。结果:小组成员的平均工作经验为8.9年(范围:3-23年)。该小组报告称,人工智能生成的答案和医生生成的答案几乎相同。专家小组认为,40%的问题由专业的社会书面回答,40%的问题由人工智能回答,13%的问题认为回答同样好,7%的问题认为人工智能和“同样好”。对于两个专业社会生成的回答和三个人工智能生成的回答,专家小组的准确性和适当性平均得分的误差条低于中性,表明对回答缺乏共识和意见不一。结论:基于专家小组的审查,人工智能提供准确、适当的答案的频率与专业协会网站上的专业协会撰写的常见问题解答一样高。人工智能和专业协会网站同样有可能得出专家组不同意的答案。关键概念:(1)大型语言模型(llm)越来越多地用于在线生成医疗信息,与传统来源相比,需要评估其准确性和有效性。(2)一个医生专家小组将人工智能(AI)生成的答案与专业学会撰写的儿科脊柱侧凸常见问题答案进行了比较。(3)专家组对人工智能生成的答案和专业社会生成的答案的不同意率相似,表明两者都有争议的领域。(4)超过一半的专家专家组成员认为他们可以区分人工智能生成的答案和专业社会生成的答案,但这与他们的偏好无关。(5)虽然人工智能可以支持医疗信息传播,需要进一步的研究和改进来解决其局限性,并确保高质量,可获得的患者教育。证据等级:四级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信