ChatGPT 和 Gemini 是否为小儿骨科疾病提供了适当的建议？

IF 1.5 3区医学 Q3 ORTHOPEDICS

Journal of Pediatric Orthopaedics Pub Date : 2025-01-01 Epub Date: 2024-08-22 DOI:10.1097/BPO.0000000000002797

Sean Pirkle, JaeWon Yang, Todd J Blumberg

{"title":"ChatGPT 和 Gemini 是否为小儿骨科疾病提供了适当的建议？","authors":"Sean Pirkle, JaeWon Yang, Todd J Blumberg","doi":"10.1097/BPO.0000000000002797","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence (AI), and in particular large language models (LLMs) such as Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini have provided additional resources for patients to research the management of healthcare conditions, for their own edification and the advocacy in the care of their children. The accuracy of these models, however, and the sources from which they draw conclusions, have been largely unstudied in pediatric orthopaedics. This research aimed to assess the reliability of machine learning tools in providing appropriate recommendations for the care of common pediatric orthopaedic conditions.Methods: ChatGPT and Gemini were queried using plain language generated from the American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPGs) listed on the Pediatric Orthopedic Society of North America (POSNA) web page. Two independent reviewers assessed the accuracy of the responses, and chi-square analyses were used to compare the 2 LLMs. Inter-rater reliability was calculated via Cohen's Kappa coefficient. If research studies were cited, attempts were made to assess their legitimacy by searching the PubMed and Google Scholar databases.Results: ChatGPT and Gemini performed similarly, agreeing with the AAOS CPGs at a rate of 67% and 69%. No significant differences were observed in the performance between the 2 LLMs. ChatGPT did not reference specific studies in any response, whereas Gemini referenced a total of 16 research papers in 6 of 24 responses. 12 of the 16 studies referenced contained errors and either were unable to be identified (7) or contained discrepancies (5) regarding publication year, journal, or proper accreditation of authorship.Conclusion: The LLMs investigated were frequently aligned with the AAOS CPGs; however, the rate of neutral statements or disagreement with consensus recommendations was substantial and frequently contained errors with citations of sources. These findings suggest there remains room for growth and transparency in the development of the models which power AI, and they may not yet represent the best source of up-to-date healthcare information for patients or providers.","PeriodicalId":16945,"journal":{"name":"Journal of Pediatric Orthopaedics","volume":" ","pages":"e66-e71"},"PeriodicalIF":1.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?\",\"authors\":\"Sean Pirkle, JaeWon Yang, Todd J Blumberg\",\"doi\":\"10.1097/BPO.0000000000002797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Artificial intelligence (AI), and in particular large language models (LLMs) such as Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini have provided additional resources for patients to research the management of healthcare conditions, for their own edification and the advocacy in the care of their children. The accuracy of these models, however, and the sources from which they draw conclusions, have been largely unstudied in pediatric orthopaedics. This research aimed to assess the reliability of machine learning tools in providing appropriate recommendations for the care of common pediatric orthopaedic conditions.Methods: ChatGPT and Gemini were queried using plain language generated from the American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPGs) listed on the Pediatric Orthopedic Society of North America (POSNA) web page. Two independent reviewers assessed the accuracy of the responses, and chi-square analyses were used to compare the 2 LLMs. Inter-rater reliability was calculated via Cohen's Kappa coefficient. If research studies were cited, attempts were made to assess their legitimacy by searching the PubMed and Google Scholar databases.Results: ChatGPT and Gemini performed similarly, agreeing with the AAOS CPGs at a rate of 67% and 69%. No significant differences were observed in the performance between the 2 LLMs. ChatGPT did not reference specific studies in any response, whereas Gemini referenced a total of 16 research papers in 6 of 24 responses. 12 of the 16 studies referenced contained errors and either were unable to be identified (7) or contained discrepancies (5) regarding publication year, journal, or proper accreditation of authorship.Conclusion: The LLMs investigated were frequently aligned with the AAOS CPGs; however, the rate of neutral statements or disagreement with consensus recommendations was substantial and frequently contained errors with citations of sources. These findings suggest there remains room for growth and transparency in the development of the models which power AI, and they may not yet represent the best source of up-to-date healthcare information for patients or providers.\",\"PeriodicalId\":16945,\"journal\":{\"name\":\"Journal of Pediatric Orthopaedics\",\"volume\":\" \",\"pages\":\"e66-e71\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Pediatric Orthopaedics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/BPO.0000000000002797\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Orthopaedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/BPO.0000000000002797","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：人工智能（AI），尤其是大型语言模型（LLMs），如聊天生成预训练转换器（ChatGPT）和双子座（Gemini），为患者提供了更多的资源来研究医疗保健条件的管理，以提高他们自己的知识水平，并为他们的孩子提供医疗服务。然而，这些模型的准确性及其得出结论的来源在儿科矫形外科中大多尚未得到研究。本研究旨在评估机器学习工具在为常见儿科骨科疾病的护理提供适当建议方面的可靠性：方法：使用从北美儿科矫形外科学会（POSNA）网页上列出的美国矫形外科医师学会（AAOS）临床实践指南（CPG）中生成的普通语言对 ChatGPT 和 Gemini 进行了查询。两名独立评审员对回答的准确性进行了评估，并使用卡方分析比较了两种 LLM。通过科恩卡帕系数（Cohen's Kappa coefficient）计算评阅者之间的可靠性。如果引用了研究成果，则通过搜索 PubMed 和 Google Scholar 数据库来评估其合法性：结果：ChatGPT 和 Gemini 的表现相似，与 AAOS CPGs 的一致率分别为 67% 和 69%。两种 LLM 的表现无明显差异。ChatGPT 在任何回复中都没有引用具体的研究，而 Gemini 在 24 个回复中的 6 个回复共引用了 16 篇研究论文。在引用的 16 篇研究论文中，有 12 篇存在错误，要么无法识别（7 篇），要么在出版年份、期刊或适当的作者认证方面存在差异（5 篇）：所调查的 LLM 经常与 AAOS CPGs 保持一致；但是，中立声明或不同意共识建议的比例很高，而且经常出现引用来源错误。这些发现表明，人工智能模型的开发仍有发展空间和透明度，它们可能还不能代表患者或医疗服务提供者最新医疗信息的最佳来源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?

Background: Artificial intelligence (AI), and in particular large language models (LLMs) such as Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini have provided additional resources for patients to research the management of healthcare conditions, for their own edification and the advocacy in the care of their children. The accuracy of these models, however, and the sources from which they draw conclusions, have been largely unstudied in pediatric orthopaedics. This research aimed to assess the reliability of machine learning tools in providing appropriate recommendations for the care of common pediatric orthopaedic conditions.

Methods: ChatGPT and Gemini were queried using plain language generated from the American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPGs) listed on the Pediatric Orthopedic Society of North America (POSNA) web page. Two independent reviewers assessed the accuracy of the responses, and chi-square analyses were used to compare the 2 LLMs. Inter-rater reliability was calculated via Cohen's Kappa coefficient. If research studies were cited, attempts were made to assess their legitimacy by searching the PubMed and Google Scholar databases.

Results: ChatGPT and Gemini performed similarly, agreeing with the AAOS CPGs at a rate of 67% and 69%. No significant differences were observed in the performance between the 2 LLMs. ChatGPT did not reference specific studies in any response, whereas Gemini referenced a total of 16 research papers in 6 of 24 responses. 12 of the 16 studies referenced contained errors and either were unable to be identified (7) or contained discrepancies (5) regarding publication year, journal, or proper accreditation of authorship.

Conclusion: The LLMs investigated were frequently aligned with the AAOS CPGs; however, the rate of neutral statements or disagreement with consensus recommendations was substantial and frequently contained errors with citations of sources. These findings suggest there remains room for growth and transparency in the development of the models which power AI, and they may not yet represent the best source of up-to-date healthcare information for patients or providers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Pediatric Orthopaedics 医学-小儿科

CiteScore

3.30

自引率

17.60%

发文量

512

审稿时长

6 months

期刊介绍： Journal of Pediatric Orthopaedics is a leading journal that focuses specifically on traumatic injuries to give you hands-on on coverage of a fast-growing field. You''ll get articles that cover everything from the nature of injury to the effects of new drug therapies; everything from recommendations for more effective surgical approaches to the latest laboratory findings.