Jonathan Liu, Mohammad Daher, Jacob Laperche, Noah Gilreath, Edward J. Testa, Mouhanad M. El-Othamni, Thomas J. Barrett, Valentin Antoci Jr.
{"title":"ChatGPT与专家关节置换外科医生在全膝关节置换术患者咨询中的比较","authors":"Jonathan Liu, Mohammad Daher, Jacob Laperche, Noah Gilreath, Edward J. Testa, Mouhanad M. El-Othamni, Thomas J. Barrett, Valentin Antoci Jr.","doi":"10.1016/j.knee.2025.03.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>This study aimed to assess the effectiveness of AI compared directly with expert arthroplasty surgeons regarding patient counseling for total knee arthroplasty (TKA).</div></div><div><h3>Methods</h3><div>A set of 10 commonly asked generic and nonspecific, single-step patient questions were selected based on review of existing patient resources and expert consensus. Responses were then collected from ChatGPT-4.0 as well as five expert arthroplasty attendings at our institution. A, B, C, D, and E represent attending responses, while F represents the ChatGPT responses. The collected responses were then blinded and independently assessed by the same five arthroplasty surgeons using a five-point Likert scale in four performance areas including empathy, accuracy, completeness, and overall quality. Average scores for each question were determined.</div></div><div><h3>Results</h3><div>Set F, the ChatGPT answers scored significantly higher than sets A, B, and D in all categories. However, set F did not differ significantly from set C, and E in all the categories. The mean score for set D was above a mean of 4, above neutral, for all four categories. This was only the case for sets C and E.<!--> <!-->When the attendings scores were combined and compared with ChatGPT, the latter had higher ratings for empathy (4.4 vs. 3.5), accuracy (4.4 vs. 3.7), completeness (4.4 vs. 3.5), and overall quality (4.4 vs. 3.6) (<em>P</em> < 0.001).</div></div><div><h3>Conclusion</h3><div>A preliminary evaluation of ChatGPT-4.0 shows potential for large language AI models to serve as a supplementary resource of patients considering TKA.</div></div>","PeriodicalId":56110,"journal":{"name":"Knee","volume":"55 ","pages":"Pages 12-17"},"PeriodicalIF":1.6000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT versus expert arthroplasty surgeons in total knee arthroplasty patient counseling\",\"authors\":\"Jonathan Liu, Mohammad Daher, Jacob Laperche, Noah Gilreath, Edward J. Testa, Mouhanad M. El-Othamni, Thomas J. Barrett, Valentin Antoci Jr.\",\"doi\":\"10.1016/j.knee.2025.03.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>This study aimed to assess the effectiveness of AI compared directly with expert arthroplasty surgeons regarding patient counseling for total knee arthroplasty (TKA).</div></div><div><h3>Methods</h3><div>A set of 10 commonly asked generic and nonspecific, single-step patient questions were selected based on review of existing patient resources and expert consensus. Responses were then collected from ChatGPT-4.0 as well as five expert arthroplasty attendings at our institution. A, B, C, D, and E represent attending responses, while F represents the ChatGPT responses. The collected responses were then blinded and independently assessed by the same five arthroplasty surgeons using a five-point Likert scale in four performance areas including empathy, accuracy, completeness, and overall quality. Average scores for each question were determined.</div></div><div><h3>Results</h3><div>Set F, the ChatGPT answers scored significantly higher than sets A, B, and D in all categories. However, set F did not differ significantly from set C, and E in all the categories. The mean score for set D was above a mean of 4, above neutral, for all four categories. This was only the case for sets C and E.<!--> <!-->When the attendings scores were combined and compared with ChatGPT, the latter had higher ratings for empathy (4.4 vs. 3.5), accuracy (4.4 vs. 3.7), completeness (4.4 vs. 3.5), and overall quality (4.4 vs. 3.6) (<em>P</em> < 0.001).</div></div><div><h3>Conclusion</h3><div>A preliminary evaluation of ChatGPT-4.0 shows potential for large language AI models to serve as a supplementary resource of patients considering TKA.</div></div>\",\"PeriodicalId\":56110,\"journal\":{\"name\":\"Knee\",\"volume\":\"55 \",\"pages\":\"Pages 12-17\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knee\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968016025000651\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knee","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968016025000651","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
ChatGPT versus expert arthroplasty surgeons in total knee arthroplasty patient counseling
Background
This study aimed to assess the effectiveness of AI compared directly with expert arthroplasty surgeons regarding patient counseling for total knee arthroplasty (TKA).
Methods
A set of 10 commonly asked generic and nonspecific, single-step patient questions were selected based on review of existing patient resources and expert consensus. Responses were then collected from ChatGPT-4.0 as well as five expert arthroplasty attendings at our institution. A, B, C, D, and E represent attending responses, while F represents the ChatGPT responses. The collected responses were then blinded and independently assessed by the same five arthroplasty surgeons using a five-point Likert scale in four performance areas including empathy, accuracy, completeness, and overall quality. Average scores for each question were determined.
Results
Set F, the ChatGPT answers scored significantly higher than sets A, B, and D in all categories. However, set F did not differ significantly from set C, and E in all the categories. The mean score for set D was above a mean of 4, above neutral, for all four categories. This was only the case for sets C and E. When the attendings scores were combined and compared with ChatGPT, the latter had higher ratings for empathy (4.4 vs. 3.5), accuracy (4.4 vs. 3.7), completeness (4.4 vs. 3.5), and overall quality (4.4 vs. 3.6) (P < 0.001).
Conclusion
A preliminary evaluation of ChatGPT-4.0 shows potential for large language AI models to serve as a supplementary resource of patients considering TKA.
期刊介绍:
The Knee is an international journal publishing studies on the clinical treatment and fundamental biomechanical characteristics of this joint. The aim of the journal is to provide a vehicle relevant to surgeons, biomedical engineers, imaging specialists, materials scientists, rehabilitation personnel and all those with an interest in the knee.
The topics covered include, but are not limited to:
• Anatomy, physiology, morphology and biochemistry;
• Biomechanical studies;
• Advances in the development of prosthetic, orthotic and augmentation devices;
• Imaging and diagnostic techniques;
• Pathology;
• Trauma;
• Surgery;
• Rehabilitation.