ChatGPT-4 Generates More Accurate and Complete Responses to Common Patient Questions About Anterior Cruciate Ligament Reconstruction Than Google’s Search Engine
Michael A. Gaudiani M.D. , Joshua P. Castle M.D. , Muhammad J. Abbas M.D. , Brittaney A. Pratt B.S. , Marquisha D. Myles B.S. , Vasilios Moutzouros M.D. , T. Sean Lynch M.D.
{"title":"ChatGPT-4 Generates More Accurate and Complete Responses to Common Patient Questions About Anterior Cruciate Ligament Reconstruction Than Google’s Search Engine","authors":"Michael A. Gaudiani M.D. , Joshua P. Castle M.D. , Muhammad J. Abbas M.D. , Brittaney A. Pratt B.S. , Marquisha D. Myles B.S. , Vasilios Moutzouros M.D. , T. Sean Lynch M.D.","doi":"10.1016/j.asmr.2024.100939","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>To replicate a patient’s internet search to evaluate ChatGPT’s appropriateness in answering common patient questions about anterior cruciate ligament reconstruction compared with a Google web search.</p></div><div><h3>Methods</h3><p>A Google web search was performed by searching the term “anterior cruciate ligament reconstruction.” The top 20 frequently asked questions and responses were recorded. The prompt “What are the 20 most popular patient questions related to ‘anterior cruciate ligament reconstruction?’” was input into ChatGPT and questions and responses were recorded. Questions were classified based on the Rothwell system and responses assessed via Flesch-Kincaid Grade Level, correctness, and completeness were for both Google web search and ChatGPT.</p></div><div><h3>Results</h3><p>Three of 20 (15%) questions were similar between Google web search and ChatGPT. The most common question types among the Google web search were value (8/20, 40%), fact (7/20, 35%), and policy (5/20, 25%). The most common question types amongst the ChatGPT search were fact (12/20, 60%), policy (6/20, 30%), and value (2/20, 10%). Mean Flesch-Kincaid Grade Level for Google web search responses was significantly lower (11.8 ± 3.8 vs 14.3 ± 2.2; <em>P</em> = .003) than for ChatGPT responses. The mean correctness for Google web search question answers was 1.47 ± 0.5, and mean completeness was 1.36 ± 0.5. Mean correctness for ChatGPT answers was 1.8 ± 0.4 and mean completeness was 1.9 ± 0.3, which were both significantly greater than Google web search answers (<em>P</em> = .03 and <em>P</em> = .0003).</p></div><div><h3>Conclusions</h3><p>ChatGPT-4 generated more accurate and complete responses to common patient questions about anterior cruciate ligament reconstruction than Google’s search engine.</p></div><div><h3>Clinical Relevance</h3><p>The use of artificial intelligence such as ChatGPT is expanding. It is important to understand the quality of information as well as how the results of ChatGPT queries compare with those from Google web searches</p></div>","PeriodicalId":34631,"journal":{"name":"Arthroscopy Sports Medicine and Rehabilitation","volume":"6 3","pages":"Article 100939"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666061X24000579/pdfft?md5=d7ac14145b1db8d87e6374fcf43f0d64&pid=1-s2.0-S2666061X24000579-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy Sports Medicine and Rehabilitation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666061X24000579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
To replicate a patient’s internet search to evaluate ChatGPT’s appropriateness in answering common patient questions about anterior cruciate ligament reconstruction compared with a Google web search.
Methods
A Google web search was performed by searching the term “anterior cruciate ligament reconstruction.” The top 20 frequently asked questions and responses were recorded. The prompt “What are the 20 most popular patient questions related to ‘anterior cruciate ligament reconstruction?’” was input into ChatGPT and questions and responses were recorded. Questions were classified based on the Rothwell system and responses assessed via Flesch-Kincaid Grade Level, correctness, and completeness were for both Google web search and ChatGPT.
Results
Three of 20 (15%) questions were similar between Google web search and ChatGPT. The most common question types among the Google web search were value (8/20, 40%), fact (7/20, 35%), and policy (5/20, 25%). The most common question types amongst the ChatGPT search were fact (12/20, 60%), policy (6/20, 30%), and value (2/20, 10%). Mean Flesch-Kincaid Grade Level for Google web search responses was significantly lower (11.8 ± 3.8 vs 14.3 ± 2.2; P = .003) than for ChatGPT responses. The mean correctness for Google web search question answers was 1.47 ± 0.5, and mean completeness was 1.36 ± 0.5. Mean correctness for ChatGPT answers was 1.8 ± 0.4 and mean completeness was 1.9 ± 0.3, which were both significantly greater than Google web search answers (P = .03 and P = .0003).
Conclusions
ChatGPT-4 generated more accurate and complete responses to common patient questions about anterior cruciate ligament reconstruction than Google’s search engine.
Clinical Relevance
The use of artificial intelligence such as ChatGPT is expanding. It is important to understand the quality of information as well as how the results of ChatGPT queries compare with those from Google web searches