Microsoft Copilot Provides More Accurate and Reliable Information About Anterior Cruciate Ligament Injury and Repair Than ChatGPT and Google Gemini; However, No Resource Was Overall the Best
{"title":"Microsoft Copilot Provides More Accurate and Reliable Information About Anterior Cruciate Ligament Injury and Repair Than ChatGPT and Google Gemini; However, No Resource Was Overall the Best","authors":"Suhasini Gupta B.S. , Rae Tarapore M.D. , Brett Haislup M.D. , Allison Fillar M.D.","doi":"10.1016/j.asmr.2024.101043","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To analyze and compare the quality, accuracy, and readability of information regarding anterior cruciate ligament (ACL) injury and reconstruction provided by various artificial intelligence AI interfaces (Google Gemini, Microsoft Copilot, and OpenAI ChatGPT).</div></div><div><h3>Methods</h3><div>Twenty questions regarding ACL reconstruction were inputted into ChatGPT 3.5, Gemini, and the more precise subinterface within Copilot and were categorized on the basis of the Rothwell criteria into Fact, Policy, and Value. The answers generated were analyzed using the DISCERN scale, JAMA benchmark criteria, and Flesch-Kincaid Reading Ease Score and Grade Level. The citations provided by Gemini and Copilot were further categorized by source of citation.</div></div><div><h3>Results</h3><div>All 3 AI interfaces generated DISCERN scores (≥50) demonstrating “good” quality of information except for Policy and Value by Copilot which were scored as “excellent” (≥70). The information provided by Copilot demonstrated greater reliability, with a JAMA benchmark criterion of 3 (of 4) as compared with Gemini (1) and ChatGPT (0). In terms of readability, the Flesch-Kincaid Reading Ease Score scores of all 3 sources were <30, apart from Fact by Copilot (31.9) demonstrating very complex answers. Similarly, all Flesch-Kincaid Grade Level scores were >13, indicating a minimum readability level of college level or college graduate. Finally, both Copilot and Gemini had a majority of references provided by journals (65.6% by Gemini and 75.4% by Copilot), followed by academic sources, whereas Copilot provided a greater number of overall citations (163) as compared with Gemini (64).</div></div><div><h3>Conclusions</h3><div>Microsoft Copilot was a better resource for patients to learn about ACL injuries and reconstruction compared with Google Gemini or OpenAI ChatGPT in terms of quality of information, reliability, and readability. The answers provided by LLMs are highly complex and no resource was overall the best.</div></div><div><h3>Clinical Relevance</h3><div>As artificial intelligence models continually evolve and demonstrate increased potential for answering complex surgical questions, it is important to investigate the quality and usefulness of the responses for patients. Although these resources may be helpful, they should not be used as a substitute for any discussions with health care providers.</div></div>","PeriodicalId":34631,"journal":{"name":"Arthroscopy Sports Medicine and Rehabilitation","volume":"7 2","pages":"Article 101043"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy Sports Medicine and Rehabilitation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666061X2400186X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
To analyze and compare the quality, accuracy, and readability of information regarding anterior cruciate ligament (ACL) injury and reconstruction provided by various artificial intelligence AI interfaces (Google Gemini, Microsoft Copilot, and OpenAI ChatGPT).
Methods
Twenty questions regarding ACL reconstruction were inputted into ChatGPT 3.5, Gemini, and the more precise subinterface within Copilot and were categorized on the basis of the Rothwell criteria into Fact, Policy, and Value. The answers generated were analyzed using the DISCERN scale, JAMA benchmark criteria, and Flesch-Kincaid Reading Ease Score and Grade Level. The citations provided by Gemini and Copilot were further categorized by source of citation.
Results
All 3 AI interfaces generated DISCERN scores (≥50) demonstrating “good” quality of information except for Policy and Value by Copilot which were scored as “excellent” (≥70). The information provided by Copilot demonstrated greater reliability, with a JAMA benchmark criterion of 3 (of 4) as compared with Gemini (1) and ChatGPT (0). In terms of readability, the Flesch-Kincaid Reading Ease Score scores of all 3 sources were <30, apart from Fact by Copilot (31.9) demonstrating very complex answers. Similarly, all Flesch-Kincaid Grade Level scores were >13, indicating a minimum readability level of college level or college graduate. Finally, both Copilot and Gemini had a majority of references provided by journals (65.6% by Gemini and 75.4% by Copilot), followed by academic sources, whereas Copilot provided a greater number of overall citations (163) as compared with Gemini (64).
Conclusions
Microsoft Copilot was a better resource for patients to learn about ACL injuries and reconstruction compared with Google Gemini or OpenAI ChatGPT in terms of quality of information, reliability, and readability. The answers provided by LLMs are highly complex and no resource was overall the best.
Clinical Relevance
As artificial intelligence models continually evolve and demonstrate increased potential for answering complex surgical questions, it is important to investigate the quality and usefulness of the responses for patients. Although these resources may be helpful, they should not be used as a substitute for any discussions with health care providers.