Seth L. Warren DPM , Steven R. Cooperman DPM, MBA, AACFAS
{"title":"Can AI outperform professional writers in summarizing foot and ankle literature?","authors":"Seth L. Warren DPM , Steven R. Cooperman DPM, MBA, AACFAS","doi":"10.1016/j.fastrc.2025.100522","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluates the performance of an advanced large language model in summarizing scientific literature within the specialized field of foot and ankle surgery. Building upon prior work that demonstrated ChatGPT-3.5′s comparability to podiatric residents, this investigation compares ChatGPT-4.5 directly against paid, professionally written summaries sourced from Foot and Ankle Quarterly. Ten original research articles were summarized by ChatGPT-4.5 and matched with corresponding professionally written summaries. Quantitative analysis using BLEU and ROUGE metrics assessed textual similarity, while Flesch Reading Ease and Flesch-Kincaid Grade Level scores evaluated readability. A qualitative preference survey was conducted among three blinded, fellowship-trained foot and ankle surgeons. Results showed that AI-generated summaries were preferred in 73.33 % of comparisons and demonstrated no factual inaccuracies. Although professionally written summaries were quantitatively more readable, AI-generated summaries maintained higher consistency in language complexity. ROUGE scores suggested substantial content overlap between AI-generated and reference summaries, whereas BLEU scores reflected differences, which may be attributable to shorter AI summary lengths. These findings suggest ChatGPT-4.5 can reliably and efficiently produce accurate, high-quality summaries, potentially surpassing paid academic writers in certain domains. Broader implications include improved efficiency in academic research and literature review. Continued investigation and oversight are necessary to guide the responsible integration of AI tools into clinical and scholarly workflows.</div></div><div><h3>Level of evidence</h3><div>III, comparative study</div></div>","PeriodicalId":73047,"journal":{"name":"Foot & ankle surgery (New York, N.Y.)","volume":"5 3","pages":"Article 100522"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foot & ankle surgery (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667396725000576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study evaluates the performance of an advanced large language model in summarizing scientific literature within the specialized field of foot and ankle surgery. Building upon prior work that demonstrated ChatGPT-3.5′s comparability to podiatric residents, this investigation compares ChatGPT-4.5 directly against paid, professionally written summaries sourced from Foot and Ankle Quarterly. Ten original research articles were summarized by ChatGPT-4.5 and matched with corresponding professionally written summaries. Quantitative analysis using BLEU and ROUGE metrics assessed textual similarity, while Flesch Reading Ease and Flesch-Kincaid Grade Level scores evaluated readability. A qualitative preference survey was conducted among three blinded, fellowship-trained foot and ankle surgeons. Results showed that AI-generated summaries were preferred in 73.33 % of comparisons and demonstrated no factual inaccuracies. Although professionally written summaries were quantitatively more readable, AI-generated summaries maintained higher consistency in language complexity. ROUGE scores suggested substantial content overlap between AI-generated and reference summaries, whereas BLEU scores reflected differences, which may be attributable to shorter AI summary lengths. These findings suggest ChatGPT-4.5 can reliably and efficiently produce accurate, high-quality summaries, potentially surpassing paid academic writers in certain domains. Broader implications include improved efficiency in academic research and literature review. Continued investigation and oversight are necessary to guide the responsible integration of AI tools into clinical and scholarly workflows.