Hayden P. Baker, Sarthak Aggarwal, Senthooran Kalidoss, Matthew Hess, Rex Haydon, Jason A. Strelzow
{"title":"ChatGPT-4在骨科肿瘤中的诊断准确性:与住院医师的比较研究","authors":"Hayden P. Baker, Sarthak Aggarwal, Senthooran Kalidoss, Matthew Hess, Rex Haydon, Jason A. Strelzow","doi":"10.1016/j.knee.2025.04.004","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Artificial intelligence (AI) is increasingly being explored for its potential role in medical diagnostics. ChatGPT-4, a large language model (LLM) with image analysis capabilities, may assist in histopathological interpretation, but its accuracy in musculoskeletal oncology remains untested. This study evaluates ChatGPT-4′s diagnostic accuracy in identifying musculoskeletal tumors from histology slides compared to orthopedic surgery residents.</div></div><div><h3>Methods</h3><div>A comparative study was conducted using 24 histology slides randomly selected from an orthopedic oncology registry. Five teams of orthopedic surgery residents (PGY-1 to PGY-5) participated in a diagnostic competition, providing their best diagnosis for each slide. ChatGPT-4 was tested separately using identical histology images and clinical vignettes, with two independent attempts. Statistical analyses, including one-way ANOVA and independent t-tests were performed to compare diagnostic accuracy.</div></div><div><h3>Results</h3><div>Orthopedic residents significantly outperformed ChatGPT-4 in diagnosing musculoskeletal tumors. The mean diagnostic accuracy among resident teams was 55%, while ChatGPT-4 achieved 25% on its first attempt and 33% on its second attempt. One-way ANOVA revealed a significant difference in accuracy across groups (<em>F</em> = 8.51, <em>p</em> = 0.033). Independent t-tests confirmed that residents performed significantly better than ChatGPT-4 (<em>t</em> = 5.80, <em>p</em> = 0.0004 for first attempt; <em>t</em> = 4.25, <em>p</em> = 0.0028 for second attempt). Both residents and ChatGPT-4 struggled with specific cases, particularly soft tissue sarcomas.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 demonstrated limited accuracy in interpreting histopathological slides compared to orthopedic residents. While AI holds promise for medical diagnostics, its current capabilities in musculoskeletal oncology remain insufficient for independent clinical use. These findings should be viewed as exploratory rather than confirmatory, and further research with larger, more diverse datasets is needed to assess AI’s role in histopathology. Future studies should investigate AI-assisted workflows, refine prompt engineering, and explore AI models specifically trained for histopathological diagnosis.</div></div>","PeriodicalId":56110,"journal":{"name":"Knee","volume":"55 ","pages":"Pages 153-160"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnostic accuracy of ChatGPT-4 in orthopedic oncology: a comparative study with residents\",\"authors\":\"Hayden P. Baker, Sarthak Aggarwal, Senthooran Kalidoss, Matthew Hess, Rex Haydon, Jason A. Strelzow\",\"doi\":\"10.1016/j.knee.2025.04.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Artificial intelligence (AI) is increasingly being explored for its potential role in medical diagnostics. ChatGPT-4, a large language model (LLM) with image analysis capabilities, may assist in histopathological interpretation, but its accuracy in musculoskeletal oncology remains untested. This study evaluates ChatGPT-4′s diagnostic accuracy in identifying musculoskeletal tumors from histology slides compared to orthopedic surgery residents.</div></div><div><h3>Methods</h3><div>A comparative study was conducted using 24 histology slides randomly selected from an orthopedic oncology registry. Five teams of orthopedic surgery residents (PGY-1 to PGY-5) participated in a diagnostic competition, providing their best diagnosis for each slide. ChatGPT-4 was tested separately using identical histology images and clinical vignettes, with two independent attempts. Statistical analyses, including one-way ANOVA and independent t-tests were performed to compare diagnostic accuracy.</div></div><div><h3>Results</h3><div>Orthopedic residents significantly outperformed ChatGPT-4 in diagnosing musculoskeletal tumors. The mean diagnostic accuracy among resident teams was 55%, while ChatGPT-4 achieved 25% on its first attempt and 33% on its second attempt. One-way ANOVA revealed a significant difference in accuracy across groups (<em>F</em> = 8.51, <em>p</em> = 0.033). Independent t-tests confirmed that residents performed significantly better than ChatGPT-4 (<em>t</em> = 5.80, <em>p</em> = 0.0004 for first attempt; <em>t</em> = 4.25, <em>p</em> = 0.0028 for second attempt). Both residents and ChatGPT-4 struggled with specific cases, particularly soft tissue sarcomas.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 demonstrated limited accuracy in interpreting histopathological slides compared to orthopedic residents. While AI holds promise for medical diagnostics, its current capabilities in musculoskeletal oncology remain insufficient for independent clinical use. These findings should be viewed as exploratory rather than confirmatory, and further research with larger, more diverse datasets is needed to assess AI’s role in histopathology. Future studies should investigate AI-assisted workflows, refine prompt engineering, and explore AI models specifically trained for histopathological diagnosis.</div></div>\",\"PeriodicalId\":56110,\"journal\":{\"name\":\"Knee\",\"volume\":\"55 \",\"pages\":\"Pages 153-160\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knee\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968016025000766\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knee","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968016025000766","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
摘要
人工智能(AI)因其在医疗诊断中的潜在作用而受到越来越多的探索。ChatGPT-4是一种具有图像分析能力的大型语言模型(LLM),可能有助于组织病理学解释,但其在肌肉骨骼肿瘤学中的准确性尚未得到验证。与骨科住院医师相比,本研究评估了ChatGPT-4在从组织学切片中识别肌肉骨骼肿瘤的诊断准确性。方法从骨科肿瘤学登记处随机选择24张组织学切片进行比较研究。五个骨科住院医师小组(PGY-1至PGY-5)参加了诊断比赛,为每张幻灯片提供他们最好的诊断。ChatGPT-4分别使用相同的组织学图像和临床小片段进行测试,并进行两次独立尝试。统计分析,包括单因素方差分析和独立t检验来比较诊断的准确性。结果骨科住院医师对肌肉骨骼肿瘤的诊断明显优于ChatGPT-4。常驻团队的平均诊断准确率为55%,而ChatGPT-4在第一次尝试时达到25%,第二次尝试时达到33%。单因素方差分析显示,各组之间的准确性有显著差异(F = 8.51, p = 0.033)。独立t检验证实,第一次尝试时,居民的表现明显优于ChatGPT-4 (t = 5.80, p = 0.0004;T = 4.25, p = 0.0028)。居民和ChatGPT-4都在某些情况下挣扎,尤其是软组织肉瘤。结论:与骨科住院医师相比,atgpt -4在解释组织病理切片方面的准确性有限。虽然人工智能有望用于医学诊断,但它目前在肌肉骨骼肿瘤学方面的能力仍不足以用于独立的临床应用。这些发现应该被视为探索性的,而不是证实性的,需要用更大、更多样化的数据集进行进一步的研究,以评估人工智能在组织病理学中的作用。未来的研究应该调查人工智能辅助的工作流程,完善即时工程,并探索专门用于组织病理学诊断的人工智能模型。
Diagnostic accuracy of ChatGPT-4 in orthopedic oncology: a comparative study with residents
Background
Artificial intelligence (AI) is increasingly being explored for its potential role in medical diagnostics. ChatGPT-4, a large language model (LLM) with image analysis capabilities, may assist in histopathological interpretation, but its accuracy in musculoskeletal oncology remains untested. This study evaluates ChatGPT-4′s diagnostic accuracy in identifying musculoskeletal tumors from histology slides compared to orthopedic surgery residents.
Methods
A comparative study was conducted using 24 histology slides randomly selected from an orthopedic oncology registry. Five teams of orthopedic surgery residents (PGY-1 to PGY-5) participated in a diagnostic competition, providing their best diagnosis for each slide. ChatGPT-4 was tested separately using identical histology images and clinical vignettes, with two independent attempts. Statistical analyses, including one-way ANOVA and independent t-tests were performed to compare diagnostic accuracy.
Results
Orthopedic residents significantly outperformed ChatGPT-4 in diagnosing musculoskeletal tumors. The mean diagnostic accuracy among resident teams was 55%, while ChatGPT-4 achieved 25% on its first attempt and 33% on its second attempt. One-way ANOVA revealed a significant difference in accuracy across groups (F = 8.51, p = 0.033). Independent t-tests confirmed that residents performed significantly better than ChatGPT-4 (t = 5.80, p = 0.0004 for first attempt; t = 4.25, p = 0.0028 for second attempt). Both residents and ChatGPT-4 struggled with specific cases, particularly soft tissue sarcomas.
Conclusions
ChatGPT-4 demonstrated limited accuracy in interpreting histopathological slides compared to orthopedic residents. While AI holds promise for medical diagnostics, its current capabilities in musculoskeletal oncology remain insufficient for independent clinical use. These findings should be viewed as exploratory rather than confirmatory, and further research with larger, more diverse datasets is needed to assess AI’s role in histopathology. Future studies should investigate AI-assisted workflows, refine prompt engineering, and explore AI models specifically trained for histopathological diagnosis.
期刊介绍:
The Knee is an international journal publishing studies on the clinical treatment and fundamental biomechanical characteristics of this joint. The aim of the journal is to provide a vehicle relevant to surgeons, biomedical engineers, imaging specialists, materials scientists, rehabilitation personnel and all those with an interest in the knee.
The topics covered include, but are not limited to:
• Anatomy, physiology, morphology and biochemistry;
• Biomechanical studies;
• Advances in the development of prosthetic, orthotic and augmentation devices;
• Imaging and diagnostic techniques;
• Pathology;
• Trauma;
• Surgery;
• Rehabilitation.